From hongbin034 at gmail.com Wed May 1 01:33:44 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Tue, 30 Apr 2019 21:33:44 -0400 Subject: [Zun] openstack appcontainer create Error In-Reply-To: References: Message-ID: Hi Alejandro, It looks your etcd cluster failed to elect a leader. You might want to check your etcd log for details, and bring your etcd cluster back to healthy state. Unfortunately, I don't have operational experience with etcd. You would need to look at their admin guide for help: https://coreos.com/etcd/docs/latest/v2/admin_guide.html . In the worst case, remove and re-install etcd could work. Best regards, Hongbin On Tue, Apr 30, 2019 at 5:32 PM Alejandro Ruiz Bermejo < arbermejo0417 at gmail.com> wrote: > Hi, i'm installing Zun in Openstack Queens with Ubuntu 18.04.1 LTS, i > already have configured docker and kuyr-libnetwork. I'm following the guide > at https://docs.openstack.org/zun/queens/install/index.html. I followed > all the steps of the installation at controller node and everything > resulted without problems. > > After finished the installation direction at compute node the *systemctl > status zun-compute* have the following errors > > root at compute /h/team# systemctl status zun-compute > ● zun-compute.service - OpenStack Container Service Compute Agent > Loaded: loaded (/etc/systemd/system/zun-compute.service; enabled; > vendor preset: enabled) > Active: active (running) since Tue 2019-04-30 16:46:56 UTC; 4h 26min ago > Main PID: 2072 (zun-compute) > Tasks: 1 (limit: 4915) > CGroup: /system.slice/zun-compute.service > └─2072 /usr/bin/python /usr/local/bin/zun-compute > > Apr 30 16:46:56 compute systemd[1]: Started OpenStack Container Service > Compute Agent. > Apr 30 16:46:57 compute zun-compute[2072]: 2019-04-30 16:46:57.929 2072 > INFO zun.cmd.compute [-] Starting server in PID 2072 > Apr 30 16:46:57 compute zun-compute[2072]: 2019-04-30 16:46:57.941 2072 > INFO zun.container.driver [-] Loading container driver > 'docker.driver.DockerDriver' > Apr 30 16:46:58 compute zun-compute[2072]: 2019-04-30 16:46:58.028 2072 > INFO zun.container.driver [-] Loading container driver > 'docker.driver.DockerDriver' > Apr 30 16:48:33 compute zun-compute[2072]: 2019-04-30 16:48:33.645 2072 > INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 > a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - > -] Loading container image driver 'glance' > Apr 30 16:48:33 compute zun-compute[2072]: 2019-04-30 16:48:33.911 2072 > INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 > a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - > -] Loading container image driver 'glance' > Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.455 2072 > INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 > 16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - > -] Loading container image driver 'glance' > Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.939 2072 > ERROR zun.image.glance.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 > a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - > -] Imae cirros was not found in glance: ImageNotFound: Image cirros could > not be found. > Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.940 2072 > INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 > a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - > -] Loading container image driver 'docker' > Apr 30 16:48:55 compute zun-compute[2072]: 2019-04-30 16:48:55.011 2072 > ERROR zun.compute.manager [req-7bfa764a-45b8-4e2f-ac70-84d8bb71b135 - - - - > -] Error occurred while calling Docker create API: Docker internal error: > 500 Server Error: Internal Server Error ("failed to update store for object > typpe *libnetwork.endpointCnt: client: etcd member http://controller:2379 > has no leader").: DockerError: Docker internal error: 500 Server Error: > Internal Server Error ("failed to update store for object type > *libnetwork.endpointtCnt: client: etcd member http://controller:2379 has > no leader"). > > Also *systemctl status docker* show the next output > > root at compute /h/team# systemctl status docker > ● docker.service - Docker Application Container Engine > Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor > preset: enabled) > Drop-In: /etc/systemd/system/docker.service.d > └─docker.conf, http-proxy.conf, https-proxy.conf > Active: active (running) since Tue 2019-04-30 16:46:25 UTC; 4h 18min ago > Docs: https://docs.docker.com > Main PID: 1777 (dockerd) > Tasks: 21 > CGroup: /system.slice/docker.service > └─1777 /usr/bin/dockerd --group zun -H tcp://compute:2375 -H > unix:///var/run/docker.sock --cluster-store etcd://controller:2379 > > Apr 30 16:46:20 compute dockerd[1777]: > time="2019-04-30T16:46:20.815305836Z" level=warning msg="Your kernel does > not support cgroup rt runtime" > Apr 30 16:46:20 compute dockerd[1777]: > time="2019-04-30T16:46:20.815933695Z" level=info msg="Loading containers: > start." > Apr 30 16:46:24 compute dockerd[1777]: > time="2019-04-30T16:46:24.378526837Z" level=info msg="Default bridge > (docker0) is assigned with an IP address 17 > Apr 30 16:46:24 compute dockerd[1777]: > time="2019-04-30T16:46:24.572558877Z" level=info msg="Loading containers: > done." > Apr 30 16:46:25 compute dockerd[1777]: > time="2019-04-30T16:46:25.198101219Z" level=info msg="Docker daemon" > commit=e8ff056 graphdriver(s)=overlay2 vers > Apr 30 16:46:25 compute dockerd[1777]: > time="2019-04-30T16:46:25.198211373Z" level=info msg="Daemon has completed > initialization" > Apr 30 16:46:25 compute dockerd[1777]: > time="2019-04-30T16:46:25.232286069Z" level=info msg="API listen on > /var/run/docker.sock" > Apr 30 16:46:25 compute dockerd[1777]: > time="2019-04-30T16:46:25.232318790Z" level=info msg="API listen on > 10.8.9.58:2375" > Apr 30 16:46:25 compute systemd[1]: Started Docker Application Container > Engine. > Apr 30 16:48:55 compute dockerd[1777]: > time="2019-04-30T16:48:55.009820439Z" level=error msg="Handler for POST > /v1.26/networks/create returned error: failed to update store for object > type *libnetwork.endpointCnt: client: etcd member http://controller:2379 > has no leader" > > > When i try to launch an app container as the guide says it shows an Error > state and when i run opentack appcontainer show this is the reason of the > error > status_reason | Docker internal error: 500 Server Error: Internal > Server Error ("failed to update store for object type > *libnetwork.endpointCnt: client: etcd member http://controller:2379 has > no leader") > > I had some troubles installing Kuryr-libnetwork besides that i didn't had > any othet problem during the installation of Zun > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Wed May 1 03:33:56 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Tue, 30 Apr 2019 21:33:56 -0600 (MDT) Subject: [placement][nova][ironic][blazar][ptg] Placement PTG Agenda Message-ID: Near the top of the placement ptg etherpad [1] I've sketched out a schedule for the end of this week for those who happen to be in Denver. Since so many of the placement team will be required elsewhere, it is pretty thin. I think this is okay because a) we got quite a bit accomplished during the pre-PTG emails, b) the main things we need to discuss [2] will be strongly informed by other discussion in the week and need to be revisited several times. The summary of the schedule is: Thursday: 14:30-Beer: In the nova room doing cross project stuff Friday: Morning: wherever you need to be, often nova room Afternoon: In the placement room (for those who can) to capture and clarify results of the Thursday session and Friday morning and topics as people present allows. Saturday: Morning: Ironic/Blazar/Placement/Anyone else interested in using placement. Afternoon: Capture and clarify, retrospective, refactoring goals, hacking. The topics in [2] (and all the related emails) will be mixed into Thursday, Friday and Saturday afternoons. Thank you for whatever time you're able to make available. If you have conflicts, don't worry, everything will get summarized later and if it is properly important will come up again. [1] https://etherpad.openstack.org/p/placement-ptg-train [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005715.html -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From Arvind.Kumar at ril.com Wed May 1 06:20:27 2019 From: Arvind.Kumar at ril.com (Arvind Kumar) Date: Wed, 1 May 2019 06:20:27 +0000 Subject: [External]Re: [Ceilometer]: cpu_util meter not being calculated as expected leading to delay in scaling In-Reply-To: References: Message-ID: Hi Trinh, I am using OpenStack Queens release on Ubuntu setup. Regards, Arvind. From: Trinh Nguyen Sent: 26 April 2019 07:31 To: Arvind Kumar Cc: openstack-discuss at lists.openstack.org Subject: [External]Re: [Ceilometer]: cpu_util meter not being calculated as expected leading to delay in scaling The e-mail below is from an external source. Please do not open attachments or click links from an unknown or suspicious origin. Hi Arvind, Could you please tell us which release of Ceilometer that you are referring to? Bests, On Wed, Apr 24, 2019 at 4:55 PM Arvind Kumar > wrote: Hi, A design issue is observed in ceilometer service of Openstack. Setup include multiple compute nodes and 3 controller nodes. Meters from each compute node are sent to all the 3 ceilometer instances via RabbitMQ in round robin fashion at an interval of 10 min. After transformation of cumulative cpu meter data, cpu_util is generated by ceilometer instance at controller node and is published to the http address configured in ceilometer pipeline configuration. cpu_util is used by the application to take the decision if scaling of VM needs to be triggered or not. Ceilometer instance calculates cpu_util for a VM from the difference between cumulative cpu usage of VM at two timestamp divided by the timestamp difference. Let’s say 1 compute node send the cumulative cpu usage of a VM (C1, C2, C3, C4) at timestamp T1, T2, T3, T4 (difference between any two timestamp is 10 min). Now (C1,T1) & (C4,T4) tuple is received by ceilometer instance 1, (C2,T2) by instance 2, (C3,T3) by instance 3. Here even if CPU usage of VM is increased between T1 & T2, cpu_util is calculated for 30 min duration (T1 & T4) rather than as expected for 10 min. This leads to scaling getting triggered after T4 that too when CPU usage is consistently above the threshold between T1 and T4. Please suggest how could this issue could be resolved. Do we have any solution to bind VM or compute node meter data to specific ceilometer instance for processing? Regards, Arvind. "Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s), are confidential and may be privileged. If you are not the intended recipient, you are hereby notified that any review, re-transmission, conversion to hard copy, copying, circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient, please notify the sender immediately by return email and delete this message and any attachments from your system. Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment." -- Trinh Nguyen www.edlab.xyz "Confidentiality Warning: This message and any attachments are intended only for the use of the intended recipient(s). are confidential and may be privileged. If you are not the intended recipient. you are hereby notified that any review. re-transmission. conversion to hard copy. copying. circulation or other use of this message and any attachments is strictly prohibited. If you are not the intended recipient. please notify the sender immediately by return email. and delete this message and any attachments from your system. Virus Warning: Although the company has taken reasonable precautions to ensure no viruses are present in this email. The company cannot accept responsibility for any loss or damage arising from the use of this email or attachment." -------------- next part -------------- An HTML attachment was scrubbed... URL: From ianyrchoi at gmail.com Wed May 1 10:46:06 2019 From: ianyrchoi at gmail.com (Ian Y. Choi) Date: Wed, 1 May 2019 19:46:06 +0900 Subject: [PTG][I18n] Additional Virtual PTG scheduling Message-ID: Hello, Although I shared my priorities as I18n PTL during Train cycle [1], I couldn't attend to the PTG at this time. There will be Docs+I18n PTG tomorrow in Denver with lots of discussions for more on cross-project collaboration with Docs team, and other teams, and I wanna join as remote on 16:00-17:00 according to [2] (thanks a lot, Frank!). I want to design an additional virtual PTG event, which some of other teams also design something similar as, but would like to plan somewhat differently to reflect I18n team members' geographical & language diversity as much as possible. Any translators, language coordinators, and contributors are welcome with the following cadence: - Please allocate your 30 minutes on May 2 (according to Denver timezone). - Please visit https://ethercalc.openstack.org/i18n-virtual-ptg-train and grasp how I18n Virtual PTG operates. - Choose your best time and write your name, country, preferred comm method, and notes by filling out cells on H19-K66.   I will be online on IRC or Zoom (or please share your best communication method - I will follow as much as possible). This might be something different from general cadence on PTG, but I really hope that I18n team will have better communication through such activities. Please join in the discussion - I will reflect all of opinions as much as possible for better I18n world during this cycle. Note that I purposely marked some of my unavailable time slots but can be adjusted well - believe me, since someone asks me when I sleep (although it is getting harder.. :p ) With many thanks, /Ian [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-March/003757.html [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005668.html From zigo at debian.org Wed May 1 11:44:34 2019 From: zigo at debian.org (Thomas Goirand) Date: Wed, 1 May 2019 13:44:34 +0200 Subject: properly sizing openstack controlplane infrastructure In-Reply-To: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> References: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> Message-ID: <6448907c-6aaf-2f91-fe77-48e697c7b80f@debian.org> On 4/30/19 5:30 PM, Hartwig Hauschild wrote: > Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. > Does that properly scale up and above 50+ nodes It does, that's not the bottleneck. >From my experience, 3 heavy control nodes are really enough to handle 200+ compute nodes. Though what you're suggesting (separating db & rabbitmq-server in separate nodes) is a very good idea. Cheers, Thomas Goirand (zigo) From manuel.sb at garvan.org.au Wed May 1 12:31:17 2019 From: manuel.sb at garvan.org.au (Manuel Sopena Ballesteros) Date: Wed, 1 May 2019 12:31:17 +0000 Subject: how to get best io performance from my block devices Message-ID: <9D8A2486E35F0941A60430473E29F15B017EA658B2@mxdb2.ad.garvan.unsw.edu.au> Dear Openstack community, I would like to have a high performance distributed database running in Openstack vms. I tried attaching dedicated nvme pci devices to the vm but the performance is not as good as I can get from bare metal. Bare metal: [root at zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May 1 22:22:45 2019 read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec) slat (usec): min=39, max=6678, avg=94.72, stdev=55.78 clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10 lat (usec): min=39, max=6679, avg=95.36, stdev=55.79 clat percentiles (nsec): | 1.00th=[ 462], 5.00th=[ 478], 10.00th=[ 482], 20.00th=[ 486], | 30.00th=[ 490], 40.00th=[ 494], 50.00th=[ 502], 60.00th=[ 510], | 70.00th=[ 516], 80.00th=[ 532], 90.00th=[ 596], 95.00th=[ 676], | 99.00th=[ 860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480], | 99.99th=[ 3728] bw ( KiB/s): min= 720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239 iops : min= 180, max=10184, avg=9847.23, stdev=1329.39, samples=239 write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec) slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04 clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71 lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03 clat percentiles (nsec): | 1.00th=[ 414], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 430], | 30.00th=[ 434], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 438], | 70.00th=[ 442], 80.00th=[ 446], 90.00th=[ 462], 95.00th=[ 588], | 99.00th=[ 700], 99.50th=[ 916], 99.90th=[ 1208], 99.95th=[ 1288], | 99.99th=[ 3536] bw ( KiB/s): min= 752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239 iops : min= 188, max=10652, avg=9841.64, stdev=1338.93, samples=239 lat (nsec) : 500=69.98%, 750=28.64%, 1000=0.90% lat (usec) : 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01% cpu : usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec Disk stats (read/write): nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28% >From vm: [centos at kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May 1 12:22:24 2019 read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec) slat (usec): min=54, max=20476, avg=115.27, stdev=71.45 clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66 lat (usec): min=56, max=20481, avg=118.51, stdev=71.66 clat percentiles (nsec): | 1.00th=[ 1800], 5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992], | 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096], | 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544], | 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736], | 99.99th=[18560] bw ( KiB/s): min= 952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237 iops : min= 238, max= 7806, avg=7038.23, stdev=1031.70, samples=237 write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec) slat (usec): min=7, max=963, avg=12.60, stdev= 6.24 clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33 lat (usec): min=10, max=970, avg=15.68, stdev= 6.48 clat percentiles (nsec): | 1.00th=[ 1688], 5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864], | 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960], | 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384], | 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120], | 99.99th=[19072] bw ( KiB/s): min= 912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237 iops : min= 228, max= 7970, avg=7029.75, stdev=1044.07, samples=237 lat (usec) : 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01% lat (usec) : 250=0.01% cpu : usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec Disk stats (read/write): nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18% Is there a way I can get near bare metal performance from my nvme block devices? NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rosmaita.fossdev at gmail.com Wed May 1 13:21:26 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 1 May 2019 07:21:26 -0600 Subject: [glance] [ops] Issue sharing an image with another project (something related to get_image_location) In-Reply-To: References: Message-ID: (Apologies for top-posting.) Hi Massimo, Two things: (1) Please file a glance bug for this. I didn't think the sharing code would touch image locations, but apparently it does. In the bug report, please include your policy settings for *_location and *_member, and also the output of an image-show call for the image you're trying to share, and the log extract. (2) With the policy settings you have for *_location, I don't think that any regular (non-admin) user will be able to download an image or boot an instance from an image, so you should verify those operations. Given what I just said, how do you protect against OSSN-0065? The following is from the Rocky release notes [0] (which you may not have seen; this item was merged after 17.0.0, and we haven't done a point release, so they're only available online): "The show_multiple_locations configuration option remains deprecated in this release, but it has not been removed. (It had been scheduled for removal in the Pike release.) Please keep a watch on the Glance release notes and the glance-specs repository to stay informed about developments on this issue. "The plan is to eliminate the option and use only policies to control image locations access. This, however, requires some major refactoring. See the draft Policy Refactor spec [1] for more information. "There is no projected timeline for this change, as no one has been able to commit time to it. The Glance team would be happy to discuss this more with anyone interested in working on it. "The workaround is to continue to use the show_multiple_locations option in a dedicated “internal” Glance node that is not accessible to end users. We continue to recommend that image locations not be exposed to end users. See OSSN-0065 for more information." Sorry for the long quote, but I wanted to take this opportunity to remind people that "The Glance team would be happy to discuss this more with anyone interested in working on it". It's particularly relevant to anyone who will be at the PTG this week -- please look for the Glance team and get a discussion started, because I don't think this item is currently a priority for Train [2]. [0] https://docs.openstack.org/releasenotes/glance/rocky.html#known-issues [1] https://review.opendev.org/#/c/528021/ [2] https://wiki.openstack.org/wiki/PTG/Train/Etherpads On 4/29/19 8:43 AM, Massimo Sgaravatto wrote: > I have a small Rocky installation where Glance is configured with 2 > backends (old images use the 'file' backend while new ones use the rbd > backend, which is the default) > > > show_multiple_locations  is true but I have these settings in policy.json: > > # grep _image_location /etc/glance/policy.json >     "delete_image_location": "role:admin", >     "get_image_location": "role:admin", >     "set_image_location": "role:admin", > > This was done because of: > https://wiki.openstack.org/wiki/OSSN/OSSN-0065 > > > If an unpriv user tries to share a private image: > > $ openstack image add project 3194a04b-ffc8-4aaf-b6c8-adc24e3d3fe6 > e81df4c0b493439abb8b85bfd4cbe071 > 403 Forbidden: Not allowed to create members for image > 3194a04b-ffc8-4aaf-b6c8-adc24e3d3fe6. (HTTP 403) > > In the log file it looks like that the problem is related to the > get_image_location operation: > > /var/log/glance/api.log:2019-04-29 16:06:54.523 8220 WARNING > glance.api.v2.image_members [req-dd93cdc9-767d-4c51-8e5a-edf746c02264 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - > default default] Not allowed to create members for image > 3194a04b-ffc8-4aaf-b6c8-adc24e3d3fe6.: Forbidden: You are not authorized > to complete get_image_location action. > > > But actually the sharing operation succeeded: > > $ glance member-list --image-id 3194a04b-ffc8-4aaf-b6c8-adc24e3d3fe6 > +--------------------------------------+----------------------------------+---------+ > | Image ID                             | Member ID                      >   | Status  | > +--------------------------------------+----------------------------------+---------+ > | 3194a04b-ffc8-4aaf-b6c8-adc24e3d3fe6 | > e81df4c0b493439abb8b85bfd4cbe071 | pending | > +--------------------------------------+----------------------------------+---------+ > > > Cheers, Massimo From james.page at canonical.com Wed May 1 14:54:02 2019 From: james.page at canonical.com (James Page) Date: Wed, 1 May 2019 08:54:02 -0600 Subject: [ptg][sig][upgrades] Train PTG Upgrades SIG session In-Reply-To: References: Message-ID: Hi All Reminder that the Upgrades SIG session is tomorrow morning (Thursday) in room 201 at the PTG. https://etherpad.openstack.org/p/upgrade-sig-ptg-train I've added slots for our regular agenda topics of Operator Feedback and Deployment Project Updates - so if you are an operator or a developer on one of the numerous deployment projects please add your name to the etherpad along with your proposed topic! Cheers James On Sat, Apr 27, 2019 at 2:41 AM James Page wrote: > Hi All > > I've finally found time to create an etherpad for the Upgrades SIG session > at the upcoming PTG in Denver (on the train on my way to LHR to catch my > flight). > > https://etherpad.openstack.org/p/upgrade-sig-ptg-train > > I've added a few proposed topics but if you're at the PTG (or summit) and > have anything upgrade related to discuss please add your topic to the > etherpad over the next few days - I'll then put together a rough schedule > for our half day of upgrades discussion on Thursday morning in room 201. > > IRC meetings never really got restarted since the last PTG but I know that > the promise of getting together to discuss upgrade successes and challenges > generally appeals to us all based on prior sessions! > > Thanks in advance and see you all in Denver! > > Cheers > > James > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylightcoder at gmail.com Wed May 1 15:19:13 2019 From: skylightcoder at gmail.com (=?UTF-8?B?R8O2a2hhbiBJxZ5JSw==?=) Date: Wed, 1 May 2019 18:19:13 +0300 Subject: [Nova][Neutron] When Trying to Use Xen Hypervisor on OpenStack, Virtual machines can not get ip Message-ID: Hi Team, I am trying to test hypervisors which OpenStack supported. So I installed 1 controller node , 1 xen compute node and 1 kvm compute node. I installed OpenStack Pike version. My kvm compute node works properly but I have problem about xen compute node. ı installed xen server 7.0 version to my server and on domU I created centos 7 virtual machine and ı installed nova-compute on it. For installing Openstack on xen I followed https://openstack-xenserver.readthedocs.io/en/latest/ guide. When I tried creating virtual machine , virtual machine is created but it has no ip. It didn't get any ip. I have no experince on xen server and ı don't know how ı can solve this problem. I looked at logs but I didn't see any errors. I need your help. I doubt of my neutron config. ı have 2 nics and one for management network and 1 for public network. These are my nics on dom0[ https://imgur.com/a/IrdLoCn]. these are my nics on domU [https://imgur.com/a/5RliHa7]. I am sending my ifconfig output on dom0[ http://paste.openstack.org/show/750146/] and domU[ http://paste.openstack.org/show/750147/]. I am also sending my openvswitch_agent.ini file[ http://paste.openstack.org/show/750148/ ]. -------------- next part -------------- An HTML attachment was scrubbed... URL: From skylightcoder at gmail.com Wed May 1 15:22:06 2019 From: skylightcoder at gmail.com (=?UTF-8?B?R8O2a2hhbiBJxZ5JSw==?=) Date: Wed, 1 May 2019 18:22:06 +0300 Subject: [Nova][Neutron] Using Xen hypervisor On Openstack Message-ID: Hi Team, I am trying to test hypervisors which OpenStack supported. So I installed 1 controller node , 1 xen compute node and 1 kvm compute node. I installed OpenStack Pike version. My kvm compute node works properly but I have problem about xen compute node. ı installed xen server 7.0 version to my server and on domU I created centos 7 virtual machine and ı installed nova-compute on it. For installing Openstack on xen I followed https://openstack-xenserver.readthedocs.io/en/latest/ guide. When I tried creating virtual machine , virtual machine is created but it has no ip. It didn't get any ip. I have no experince on xen server and ı don't know how ı can solve this problem. I looked at logs but I didn't see any errors. I need your help. I doubt of my neutron config. ı have 2 nics and one for management network and 1 for public network. These are my nics on dom0[ https://imgur.com/a/IrdLoCn]. these are my nics on domU [https://imgur.com/a/5RliHa7]. I am sending my ifconfig output on dom0[ http://paste.openstack.org/show/750146/] and domU[ http://paste.openstack.org/show/750147/]. I am also sending my openvswitch_agent.ini file[ http://paste.openstack.org/show/750148/ ]. ı am waiting for your help. Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Wed May 1 15:21:51 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Wed, 01 May 2019 11:21:51 -0400 Subject: [devstack] Identity URL problem In-Reply-To: References: Message-ID: <3114008b-bf06-4f74-b74a-2c03676cb860@www.fastmail.com> On Tue, Apr 30, 2019, at 10:41, Neil Jerram wrote: > Does anyone know what causes this problem at [1]: > > 2019-04-30 16:34:03.137 | +++ functions-common:oscwrap:2346 : command > openstack role add admin --user neutron --project service --user-domain > Default --project-domain Default > 2019-04-30 16:34:03.139 | +++ functions-common:oscwrap:2346 : openstack > role add admin --user neutron --project service --user-domain Default > --project-domain Default > 2019-04-30 16:34:04.331 | Failed to discover available identity > versions when contacting http://104.239.175.234/identity. Attempting to > parse version from URL. > 2019-04-30 16:34:04.331 | Could not find versioned identity endpoints > when attempting to authenticate. Please check that your auth_url is > correct. Not Found (HTTP 404) > > [1] > http://logs.openstack.org/79/638479/3/check/networking-calico-tempest-dsvm/5431e4b/logs/devstacklog.txt.gz > > I think there are loads of uses of that URL, before where the > networking-calico plugin uses it, so I can't see why the plugin's use > hits that error. > > Thanks, > Neil > That error usually means that keystone couldn't be reached at all. Looking through the devstack log, it looks like keystone is not even enabled: http://logs.openstack.org/79/638479/3/gate/networking-calico-tempest-dsvm/5888def/logs/devstacklog.txt.gz#_2019-04-11_11_05_00_946 Colleen From neil at tigera.io Wed May 1 15:28:51 2019 From: neil at tigera.io (Neil Jerram) Date: Wed, 1 May 2019 16:28:51 +0100 Subject: [devstack] Identity URL problem In-Reply-To: <3114008b-bf06-4f74-b74a-2c03676cb860@www.fastmail.com> References: <3114008b-bf06-4f74-b74a-2c03676cb860@www.fastmail.com> Message-ID: On Wed, May 1, 2019 at 4:21 PM Colleen Murphy wrote: > On Tue, Apr 30, 2019, at 10:41, Neil Jerram wrote: > > Does anyone know what causes this problem at [1]: > > > > 2019-04-30 16:34:03.137 | +++ functions-common:oscwrap:2346 : command > > openstack role add admin --user neutron --project service --user-domain > > Default --project-domain Default > > 2019-04-30 16:34:03.139 | +++ functions-common:oscwrap:2346 : openstack > > role add admin --user neutron --project service --user-domain Default > > --project-domain Default > > 2019-04-30 16:34:04.331 | Failed to discover available identity > > versions when contacting http://104.239.175.234/identity. Attempting to > > parse version from URL. > > 2019-04-30 16:34:04.331 | Could not find versioned identity endpoints > > when attempting to authenticate. Please check that your auth_url is > > correct. Not Found (HTTP 404) > > > > [1] > > > http://logs.openstack.org/79/638479/3/check/networking-calico-tempest-dsvm/5431e4b/logs/devstacklog.txt.gz > > > > I think there are loads of uses of that URL, before where the > > networking-calico plugin uses it, so I can't see why the plugin's use > > hits that error. > > > > Thanks, > > Neil > > > > That error usually means that keystone couldn't be reached at all. Looking > through the devstack log, it looks like keystone is not even enabled: > > > http://logs.openstack.org/79/638479/3/gate/networking-calico-tempest-dsvm/5888def/logs/devstacklog.txt.gz#_2019-04-11_11_05_00_946 Many thanks Colleen, I'll explicitly enable keystone and see if that helps. Do you know if that's a recent change, that keystone used to be enabled by default, and now requires explicit enabling? Best wishes, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From neil at tigera.io Wed May 1 16:10:39 2019 From: neil at tigera.io (Neil Jerram) Date: Wed, 1 May 2019 17:10:39 +0100 Subject: [devstack] Identity URL problem In-Reply-To: References: <3114008b-bf06-4f74-b74a-2c03676cb860@www.fastmail.com> Message-ID: On Wed, May 1, 2019 at 4:28 PM Neil Jerram wrote: > On Wed, May 1, 2019 at 4:21 PM Colleen Murphy wrote: > >> On Tue, Apr 30, 2019, at 10:41, Neil Jerram wrote: >> > Does anyone know what causes this problem at [1]: >> > >> > 2019-04-30 16:34:03.137 | +++ functions-common:oscwrap:2346 : command >> > openstack role add admin --user neutron --project service --user-domain >> > Default --project-domain Default >> > 2019-04-30 16:34:03.139 | +++ functions-common:oscwrap:2346 : openstack >> > role add admin --user neutron --project service --user-domain Default >> > --project-domain Default >> > 2019-04-30 16:34:04.331 | Failed to discover available identity >> > versions when contacting http://104.239.175.234/identity. Attempting >> to >> > parse version from URL. >> > 2019-04-30 16:34:04.331 | Could not find versioned identity endpoints >> > when attempting to authenticate. Please check that your auth_url is >> > correct. Not Found (HTTP 404) >> > >> > [1] >> > >> http://logs.openstack.org/79/638479/3/check/networking-calico-tempest-dsvm/5431e4b/logs/devstacklog.txt.gz >> > >> > I think there are loads of uses of that URL, before where the >> > networking-calico plugin uses it, so I can't see why the plugin's use >> > hits that error. >> > >> > Thanks, >> > Neil >> > >> >> That error usually means that keystone couldn't be reached at all. >> Looking through the devstack log, it looks like keystone is not even >> enabled: >> >> >> http://logs.openstack.org/79/638479/3/gate/networking-calico-tempest-dsvm/5888def/logs/devstacklog.txt.gz#_2019-04-11_11_05_00_946 > > > Many thanks Colleen, I'll explicitly enable keystone and see if that helps. > > Do you know if that's a recent change, that keystone used to be enabled by > default, and now requires explicit enabling? > I'm sorry, I've spotted what the real problem is now, and it doesn't implicate any change to the enablement of keystone. But many thanks again for your input, which was the hint I needed to see the problem! (networking-calico's devstack plugin supports running on multiple nodes, and has a heuristic to differentiate between when it's the first node being set up - with both control and compute functions - and when it's a subsequent node - with compute only. That heuristic had gone wrong, so CI was installing a compute-only node.) Best wishes, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Wed May 1 16:45:27 2019 From: amy at demarco.com (Amy) Date: Wed, 1 May 2019 10:45:27 -0600 Subject: [OpenStack-Ansible][OSA] Team Dinner Denver In-Reply-To: References: Message-ID: <8B54A126-2501-4EC3-B0BF-E5FB92FD3CC1@demarco.com> We will be going to the 5280 Burger Bar at 7:00pm. We have a private room!! Hope to see everyone there! Amy (spotz) Sent from my iPhone > On Apr 28, 2019, at 2:34 PM, Amy Marrich wrote: > > We are looking at having our official team dinner Wednesday evening. Please visit this etherpad: > > https://etherpad.openstack.org/p/osa-team-dinner-plan > > To add your name and vote on restaurants so I can get a head count and make a reservation. > > Thanks, > > Amy (spotz) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyfreudberg at gmail.com Wed May 1 16:45:38 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Wed, 1 May 2019 12:45:38 -0400 Subject: [ironic][neutron][ops] Ironic multi-tenant networking, VMs Message-ID: Hi all, I'm wondering if anyone has any best practices for Ironic bare metal nodes and regular VMs living on the same network. I'm sure if involves Ironic's `neutron` multi-tenant network driver, but I'm a bit hazy on the rest of the details (still very much in the early stages of exploring Ironic). Surely it's possible, but I haven't seen mention of this anywhere (except the very old spec from 2015 about introducing ML2 support into Ironic) nor is there a gate job resembling this specific use. Ideas? Thanks, Jeremy From luka.peschke at objectif-libre.com Wed May 1 17:33:13 2019 From: luka.peschke at objectif-libre.com (Luka Peschke) Date: Wed, 01 May 2019 19:33:13 +0200 Subject: [cloudkitty] May meeting is cancelled Message-ID: <27905e8e.AMQAADooJtoAAAAAAAAAAAQR_QkAAAAAZtYAAAAAAAzbjABcydha@mailjet.com> Hello everybody, Given that most of us won't be available on friday the 3rd, the cloudkitty IRC meeting that was planned at that date is cancelled. The next meeting will be held on june 7th at 15h UTC / 17h CET. Cheers, -- Luka Peschke From gagehugo at gmail.com Wed May 1 21:57:52 2019 From: gagehugo at gmail.com (Gage Hugo) Date: Wed, 1 May 2019 15:57:52 -0600 Subject: [security-sig] Security SIG BoF Notes Message-ID: Thanks to everyone who attended the Security SIG BoF session! Attached are the notes taken from the discussion during the session with relevant links. If there was anything missed, please feel free to mention it here or reach out in #openstack-security. Board Picture: https://drive.google.com/open?id=1YWYdp9F5faGzlww1Cr7-i2TawDh60trg Topics: - Overall Security SIG - Links: - https://security.openstack.org/ - https://wiki.openstack.org/wiki/Security-SIG - Security SIG: https://wiki.openstack.org/wiki/Security-SIG - Weekly Agenda: https://etherpad.openstack.org/p/security-agenda - Meeting Time: Weekly on Thursday at 1500 UTC #openstack-meeting - IRC Server: irc.freenode.net - Key Lime: https://github.com/keylime/keylime - Integration with Ironic https://github.com/keylime/keylime/issues/101 - Bandit: https://github.com/PyCQA/bandit - Running bandit as part of tox gate - Keystone does this: https://github.com/openstack/keystone/blob/master/tox.ini#L40 - Run as a separate job - Example (not tox): https://github.com/openstack/openstack-helm/blob/master/zuul.d/jobs-openstack-helm.yaml#L27-L36 - Host Intrusion - Wazuh was mentioned: https://wazuh.com/ - Ansible Hardening - OpenStack Ansible: https://docs.openstack.org/openstack-ansible/latest/ - Security SIG "Help Wanted" - https://docs.openstack.org/security-analysis/latest/ - Only has Barbican, missing other projects that have been added since - Multiple other libraries in review to be added - https://review.openstack.org/#/q/project:openstack/security-analysis+is:open - https://docs.openstack.org/security-guide/ - Security guide doesn’t seem to have been updated since Pike, so it’s a good 1.5 years behind - https://security.openstack.org/#secure-development-guidelines - Improve documentation of secure coding practices - improve coverage of bandit and syntribos jobs across projects, and look into other similar tools we could be using to better secure the software we write - https://wiki.openstack.org/wiki/Security_Notes - Help with writing security notes and triaging the backlog - https://wiki.openstack.org/wiki/Security/Security_Note_Process - https://bugs.launchpad.net/ossn - Security blog: http://openstack-security.github.io/ - VMT Public Bug Assistance - Many reports of suspected vulnerabilities start out as public bugs or are made public over the course of being triaged, and assistance with those is encouraged from the entire community - https://bugs.launchpad.net/ossa - Having someone who is familiar with the affected project provide context to a security bug really helps the VMT definine concrete impact statements and speeds up the overall process - Bootstrapping AWS / Windows Guest Domains / Guest VMs - nova-join: https://github.com/openstack/novajoin - application credentials: https://docs.openstack.org/keystone/latest/user/application_credentials.html - Barbican: https://wiki.openstack.org/wiki/Barbican - Policy - Cross-project policy effort: - https://governance.openstack.org/tc/goals/queens/policy-in-code.html - https://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/policy-goals.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From juliaashleykreger at gmail.com Wed May 1 22:38:37 2019 From: juliaashleykreger at gmail.com (Julia Kreger) Date: Wed, 1 May 2019 16:38:37 -0600 Subject: [ironic][neutron][ops] Ironic multi-tenant networking, VMs In-Reply-To: References: Message-ID: Greetings Jeremy, Best Practice wise, I'm not directly aware of any. It is largely going to depend upon your Neutron ML2 drivers and network fabric. In essence, you'll need an ML2 driver which supports the vnic type of "baremetal", which is able to able to orchestrate the switch port port binding configuration in your network fabric. If your using vlan networks, in essence you'll end up with a neutron physical network which is also a trunk port to the network fabric, and the ML2 driver would then appropriately tag the port(s) for the baremetal node to the networks required. In the CI gate, we do this in the "multitenant" jobs where networking-generic-switch modifies the OVS port configurations directly. If specifically vxlan is what your looking to use between VMs and baremetal nodes, I'm unsure of how you would actually configure that, but in essence the VXLANs would still need to be terminated on the switch port via the ML2 driver. In term of Ironic's documentation, If you haven't already seen it, you might want to check out ironic's multi-tenancy documentation[1]. -Julia [1]: https://docs.openstack.org/ironic/latest/admin/multitenancy.html On Wed, May 1, 2019 at 10:53 AM Jeremy Freudberg wrote: > > Hi all, > > I'm wondering if anyone has any best practices for Ironic bare metal > nodes and regular VMs living on the same network. I'm sure if involves > Ironic's `neutron` multi-tenant network driver, but I'm a bit hazy on > the rest of the details (still very much in the early stages of > exploring Ironic). Surely it's possible, but I haven't seen mention of > this anywhere (except the very old spec from 2015 about introducing > ML2 support into Ironic) nor is there a gate job resembling this > specific use. > > Ideas? > > Thanks, > Jeremy > From ekcs.openstack at gmail.com Wed May 1 22:59:57 2019 From: ekcs.openstack at gmail.com (Eric K) Date: Wed, 1 May 2019 15:59:57 -0700 Subject: [self-healing] live-migrate instance in response to fault signals Message-ID: Hi dasp, Follow up on the discussion today at self-healing BoF. I think you said on the etherpad [1]: ==== Ability to drain (live migrate away) instances automatically in response to any failure/soft-fail/early failure indication (e.g. dropped packets, SMART disk status, issues with RBD connections, repeated build failures, etc) Then quarantine, rebuild, self-test compute host (or hold for hardware fix) Context: generally no clue what is running inside VMs (like public cloud) ==== I just want to follow up to get more info on the context; specifically, which of the following pieces are the main difficulties? - detecting the failure/soft-fail/early failure indication - codifying how to respond to each failure scenario - triggering/executing the desired workflow - something else [1] https://etherpad.openstack.org/p/DEN-self-healing-SIG From gn01737625 at gmail.com Wed May 1 07:45:25 2019 From: gn01737625 at gmail.com (Ming-Che Liu) Date: Wed, 1 May 2019 15:45:25 +0800 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. Message-ID: Hello, I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. I follow the steps as mentioned in https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html The setting in my computer's globals.yml as same as [Quick Start] tutorial (attached file: globals.yml is my setting). My machine environment as following: OS: Ubuntu 16.04 Kolla-ansible verions: 8.0.0.0rc1 ansible version: 2.7 When I execute [bootstrap-servers] and [prechecks], it seems ok (no fatal error or any interrupt). But when I execute [deploy], it will occur some error about rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when I set enable_rabbitmq:no). I have some detail screenshot about the errors as attached files, could you please help me to solve this problem? Thank you very much. [Attached file description]: globals.yml: my computer's setting about kolla-ansible As mentioned above, the following pictures show the errors, the rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova compute service error will occur if I set [enable_rabbitmq:no]. [image: docker-version.png] [image: kolla-ansible-version.png] [image: nova-compute-service-error.png] [image: rabbitmq_error.png] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: kolla-ansible-version.png Type: image/png Size: 122071 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rabbitmq_error.png Type: image/png Size: 245303 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: nova-compute-service-error.png Type: image/png Size: 255191 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: docker-version.png Type: image/png Size: 118420 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: globals.yml Type: application/octet-stream Size: 20214 bytes Desc: not available URL: From me at not.mn Wed May 1 23:14:25 2019 From: me at not.mn (John Dickinson) Date: Wed, 1 May 2019 17:14:25 -0600 Subject: [stable] propose Tim Burke as stable core Message-ID: <1F014297-E404-49B6-BE09-61F4DA478AF5@not.mn> Tim has been very active in proposing and maintaining patches to Swift’s stable branches. Of recent (non-automated) backports, Tim has proposed more than a third of them. --John From gmann at ghanshyammann.com Wed May 1 23:18:11 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Wed, 01 May 2019 18:18:11 -0500 Subject: [qa][ptg] QA Dinner on 2nd May @ 6.30 PM Message-ID: <16a75b0f051.f1ab3fb0153137.4432104510546283876@ghanshyammann.com> Hi All, We have planned for QA dinner on 2nd May, Thursday 6.30 PM. Anyone is welcome to join. Here are the details: Restaurant: Indian Resturant ('Little India Downtown Denver') Map: shorturl.at/byDIJ Wednesday night, 6:30 PM Meeting at the restaurant directly. -gmann From jungleboyj at gmail.com Thu May 2 00:14:04 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 1 May 2019 19:14:04 -0500 Subject: [cinder] [PTG] Room for Thursday Morning ... Message-ID: <39a39d58-ac37-a024-5014-ab5548debd8b@gmail.com> Team, There is some confusion with the schedule.  Thought we were scheduled for room 203 in the morning but we weren't. Room 112, was free so I have booked that for our use Thursday morning. See you all there.  Looking forward to a few productive days of discussion. Jay From sean.mcginnis at gmx.com Thu May 2 00:32:50 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Wed, 1 May 2019 19:32:50 -0500 Subject: [cinder][ops] Nested Quota Driver Use? Message-ID: <20190502003249.GA1432@sm-workstation> Hey everyone, I'm hoping to get some feedback from folks, especially operators. In the Liberty release, Cinder introduced the ability to use a Nest Quota Driver to handle cases of heirarchical projects and quota enforcement [0]. I have not heard of anyone actually using this. I also haven't seen any bugs filed, which makes me a little suspicious given how complicated it can be. I would like to know if any operators are using this for nested quotas. There is an effort underway for a new mechanism called "unified limits" that will require a lot of modifications to the Cinder code. If this quota driver is not needed, I would like to deprecated it in Train so it can be removed in the U release and hopefully prevent some unnecessary work being done. Any feedback on this would be appreciated. Thanks! Sean [0] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/cinder-nested-quota-driver.html From massimo.sgaravatto at gmail.com Thu May 2 07:03:09 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 2 May 2019 09:03:09 +0200 Subject: [glance] [ops] Issue sharing an image with another project (something related to get_image_location) In-Reply-To: References: Message-ID: Hi Brian Thanks for your A couple of answers in-line: On Wed, May 1, 2019 at 3:25 PM Brian Rosmaita wrote: > (Apologies for top-posting.) > > Hi Massimo, > > Two things: > > (1) Please file a glance bug for this. I didn't think the sharing code > would touch image locations, but apparently it does. In the bug report, > please include your policy settings for *_location and *_member, and > also the output of an image-show call for the image you're trying to > share, and the log extract. > Sure: I will > > (2) With the policy settings you have for *_location, I don't think that > any regular (non-admin) user will be able to download an image or boot > an instance from an image, so you should verify those operations. Actually it works E.g.: $ openstack image show 7ebe160d-5498-477b-aa2e-94a6d962a075 +------------------+------------------------------------------------------------------------------+ | Field | Value | +------------------+------------------------------------------------------------------------------+ | checksum | b4548edf0bc476c50c083fb88717d92f | | container_format | bare | | created_at | 2018-01-15T16:14:35Z | | disk_format | qcow2 | | file | /v2/images/7ebe160d-5498-477b-aa2e-94a6d962a075/file | | id | 7ebe160d-5498-477b-aa2e-94a6d962a075 | | min_disk | 3 | | min_ram | 512 | | name | CentOS7 | | owner | 56c3f5c047e74a78a71438c4412e6e13 | | properties | locations='[]', os_hash_algo='None', os_hash_value='None', os_hidden='False' | | protected | False | | schema | /v2/schemas/image | | size | 877985792 | | status | active | | tags | | | updated_at | 2018-01-15T16:21:23Z | | virtual_size | None | | visibility | public | +------------------+------------------------------------------------------------------------------+ So locations are not showed, as expected, since I am a 'regular' (non-admin) user But I able to download the image: $ openstack image save --file ~/CentOS7.qcow2 7ebe160d-5498-477b-aa2e-94a6d962a075 $ ls -l ~/CentOS7.qcow2 -rw-r--r-- 1 sgaravat utenti 877985792 May 2 08:54 /home/sgaravat/CentOS7.qcow2 $ md5sum ~/CentOS7.qcow2 b4548edf0bc476c50c083fb88717d92f /home/sgaravat/CentOS7.qcow2 I am also able to launch an instance using this image Thanks, Massimo > -------------- next part -------------- An HTML attachment was scrubbed... URL: From eyalb1 at gmail.com Thu May 2 07:12:12 2019 From: eyalb1 at gmail.com (Eyal B) Date: Thu, 2 May 2019 10:12:12 +0300 Subject: [Vitrage] add datasource kapacitor for vitrage In-Reply-To: <1324083046.973516.1556615406841.JavaMail.zimbra@viettel.com.vn> References: <14511424.947437.1556614048877.JavaMail.zimbra@viettel.com.vn> <1324083046.973516.1556615406841.JavaMail.zimbra@viettel.com.vn> Message-ID: Hi, Please make sure all test are passing Eyal On Thu, May 2, 2019, 02:18 wrote: > Hi, > In our system, we use monitor by TICK stack (include: Telegraf for > collect metric, InfluxDB for storage metric, Chronograf for visualize and > Kapacitor alarming), which is popular monitor solution. > We hope can integrate vitrage in, so we decide to write kapacitor > datasource contribute for vitrage. > The work is almost done , you can review in: > https://review.opendev.org/653416 > > So i send this mail hope for more review, ideal,... Appreciate it. > also ask: have any step i miss in pipeline of contribute datasource > vitrage? like create blueprints, vitrage-spec,vv.. Should i do it? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ceo at teo-en-ming-corp.com Thu May 2 08:03:30 2019 From: ceo at teo-en-ming-corp.com (Turritopsis Dohrnii Teo En Ming) Date: Thu, 2 May 2019 08:03:30 +0000 Subject: Which are the most popular free open source OpenStack cloud operating systems or distros? Message-ID: Subject/Topic: Which are the most popular free open source OpenStack cloud operating systems or distros? Good afternoon from Singapore, First of all, I am very new to OpenStack. May I know which are the most popular free open source OpenStack cloud operating systems or distros? How do I download, install and deploy these OpenStack distros as private cloud, public cloud or hybrid cloud? Where can I find good and detailed documentation? Thank you very much for your advice. -----BEGIN EMAIL SIGNATURE----- The Gospel for all Targeted Individuals (TIs): [The New York Times] Microwave Weapons Are Prime Suspect in Ills of U.S. Embassy Workers Link: https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html ******************************************************************************************** Singaporean Mr. Turritopsis Dohrnii Teo En Ming's Academic Qualifications as at 14 Feb 2019 [1] https://tdtemcerts.wordpress.com/ [2] https://tdtemcerts.blogspot.sg/ [3] https://www.scribd.com/user/270125049/Teo-En-Ming -----END EMAIL SIGNATURE----- From berndbausch at gmail.com Thu May 2 08:20:13 2019 From: berndbausch at gmail.com (Bernd Bausch) Date: Thu, 2 May 2019 17:20:13 +0900 Subject: Which are the most popular free open source OpenStack cloud operating systems or distros? In-Reply-To: References: Message-ID: <762494EA-9BC6-45E1-A75C-5D0DAC488DE3@gmail.com> I am not aware of a popularity ranking, but the usual commercial Linux vendors and a large number of other providers offer distros. See https://www.openstack.org/marketplace/distros/ for a list. Download and documentation details are available at the vendors’ web sites. Since OpenStack is open-source, so are the distros. In addition, you find non-commercial deployment tools on the documentation web site https://docs.openstack.org/stein/deploy/. You can also hand-craft your cloud: https://docs.openstack.org/stein/install/. Bernd > On May 2, 2019, at 17:03, Turritopsis Dohrnii Teo En Ming wrote: > > Subject/Topic: Which are the most popular free open source OpenStack cloud operating systems or distros? > > Good afternoon from Singapore, > > First of all, I am very new to OpenStack. > > May I know which are the most popular free open source OpenStack cloud operating systems or distros? > > How do I download, install and deploy these OpenStack distros as private cloud, public cloud or hybrid cloud? > > Where can I find good and detailed documentation? > > Thank you very much for your advice. > > -----BEGIN EMAIL SIGNATURE----- > > The Gospel for all Targeted Individuals (TIs): > > [The New York Times] Microwave Weapons Are Prime Suspect in Ills of > U.S. Embassy Workers > > Link: https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html > > ******************************************************************************************** > > Singaporean Mr. Turritopsis Dohrnii Teo En Ming's Academic > Qualifications as at 14 Feb 2019 > > [1] https://tdtemcerts.wordpress.com/ > > [2] https://tdtemcerts.blogspot.sg/ > > [3] https://www.scribd.com/user/270125049/Teo-En-Ming > > -----END EMAIL SIGNATURE----- > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at stackhpc.com Thu May 2 08:21:39 2019 From: doug at stackhpc.com (Doug Szumski) Date: Thu, 2 May 2019 09:21:39 +0100 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. [kolla] In-Reply-To: References: Message-ID: <10f217bf-33a2-d40a-8bcf-6994c26be699@stackhpc.com> On 01/05/2019 08:45, Ming-Che Liu wrote: > Hello, > > I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. It doesn't look like Monasca is enabled in your globals.yml file. Are you trying to set up OpenStack services first and then enable Monasca afterwards? You can also deploy Monasca standalone if that is useful: https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/monasca-guide.html > > I follow the steps as mentioned in > https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html > > The setting in my computer's globals.yml as same as [Quick Start] > tutorial (attached file: globals.yml is my setting). > > My machine environment as following: > OS: Ubuntu 16.04 > Kolla-ansible verions: 8.0.0.0rc1 > ansible version: 2.7 > > When I execute [bootstrap-servers] and [prechecks], it seems ok (no > fatal error or any interrupt). > > But when I execute [deploy], it will occur some error about > rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when > I set  enable_rabbitmq:no). > > I have some detail screenshot about the errors as attached files, > could you please help me to solve this problem? Please can you post more information on why the containers are not starting. - Inspect rabbit and nova-compute logs (in /var/lib/docker/volumes/kolla_logs/_data/) - Check relevant containers are running, and if they are restarting check the output. Eg. docker logs --follow nova_compute > > Thank you very much. > > [Attached file description]: > globals.yml: my computer's setting about kolla-ansible > > As mentioned above, the following pictures show the errors, the > rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova > compute service error will occur if I set [enable_rabbitmq:no]. > docker-version.png > kolla-ansible-version.png > nova-compute-service-error.png > rabbitmq_error.png From massimo.sgaravatto at gmail.com Thu May 2 08:28:22 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 2 May 2019 10:28:22 +0200 Subject: [glance] [ops] Issue sharing an image with another project (something related to get_image_location) In-Reply-To: References: Message-ID: On Thu, May 2, 2019 at 9:03 AM Massimo Sgaravatto < massimo.sgaravatto at gmail.com> wrote: > > Hi Brian > > Thanks for your > A couple of answers in-line: > > On Wed, May 1, 2019 at 3:25 PM Brian Rosmaita > wrote: > >> (Apologies for top-posting.) >> >> Hi Massimo, >> >> Two things: >> >> (1) Please file a glance bug for this. I didn't think the sharing code >> would touch image locations, but apparently it does. In the bug report, >> please include your policy settings for *_location and *_member, and >> also the output of an image-show call for the image you're trying to >> share, and the log extract. >> > > Sure: I will > https://bugs.launchpad.net/glance/+bug/1827342 Thanks again, Massimo -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at stackhpc.com Thu May 2 09:56:47 2019 From: doug at stackhpc.com (Doug Szumski) Date: Thu, 2 May 2019 10:56:47 +0100 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. [kolla] In-Reply-To: References: <10f217bf-33a2-d40a-8bcf-6994c26be699@stackhpc.com> Message-ID: On 02/05/2019 10:13, Ming-Che Liu wrote: > Hello, > > Thank you for replying, my goal is to deploy [all-in-one] > openstack+monasca(in the same physical machine/VM). > > I will check the detail error information and provide such logs for > you, thank you. > > I also have a question about kolla-ansible 8.0.0.0rc1, when I check > the new feature about kolla-ansible 8.0.0.0rc1, it seems only > 8.0.0.0rc1 provide the "complete" monasca functionality, it that > right(that means you can see monasca's plugin in openstack horizon, as > the following picture)? > You are correct that Monasca is supported from the Stein release onwards. Due to a number of people asking we have created a backport to Rocky, but the patches are not merged yet. Please see this bug for a link to the patch chains: https://bugs.launchpad.net/kolla-ansible/+bug/1824982 The horizon-ui-plugin isn't currently installed in the Horizon image, but I can easily add a patch for it. Similar functionality is currently provided by the monasca-grafana fork (which provides Keystone integration), for example: Menu Overview > Thank you very much. > > Regards, > > Shawn > > monasca.png > > > Doug Szumski > 於 > 2019年5月2日 週四 下午4:21寫道: > > > On 01/05/2019 08:45, Ming-Che Liu wrote: > > Hello, > > > > I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. > > It doesn't look like Monasca is enabled in your globals.yml file. Are > you trying to set up OpenStack services first and then enable Monasca > afterwards? You can also deploy Monasca standalone if that is useful: > > https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/monasca-guide.html > > > > > I follow the steps as mentioned in > > https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html > > > > The setting in my computer's globals.yml as same as [Quick Start] > > tutorial (attached file: globals.yml is my setting). > > > > My machine environment as following: > > OS: Ubuntu 16.04 > > Kolla-ansible verions: 8.0.0.0rc1 > > ansible version: 2.7 > > > > When I execute [bootstrap-servers] and [prechecks], it seems ok (no > > fatal error or any interrupt). > > > > But when I execute [deploy], it will occur some error about > > rabbitmq(when I set enable_rabbitmq:yes) and nova compute > service(when > > I set  enable_rabbitmq:no). > > > > I have some detail screenshot about the errors as attached files, > > could you please help me to solve this problem? > > Please can you post more information on why the containers are not > starting. > > - Inspect rabbit and nova-compute logs (in > /var/lib/docker/volumes/kolla_logs/_data/) > > - Check relevant containers are running, and if they are restarting > check the output. Eg. docker logs --follow nova_compute > > > > > Thank you very much. > > > > [Attached file description]: > > globals.yml: my computer's setting about kolla-ansible > > > > As mentioned above, the following pictures show the errors, the > > rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova > > compute service error will occur if I set [enable_rabbitmq:no]. > > docker-version.png > > kolla-ansible-version.png > > nova-compute-service-error.png > > rabbitmq_error.png > From doug at stackhpc.com Thu May 2 09:58:45 2019 From: doug at stackhpc.com (Doug Szumski) Date: Thu, 2 May 2019 10:58:45 +0100 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. [kolla] In-Reply-To: References: <10f217bf-33a2-d40a-8bcf-6994c26be699@stackhpc.com> Message-ID: On 02/05/2019 10:56, Doug Szumski wrote: > > On 02/05/2019 10:13, Ming-Che Liu wrote: >> Hello, >> >> Thank you for replying, my goal is to deploy [all-in-one] >> openstack+monasca(in the same physical machine/VM). >> >> I will check the detail error information and provide such logs for >> you, thank you. >> >> I also have a question about kolla-ansible 8.0.0.0rc1, when I check >> the new feature about kolla-ansible 8.0.0.0rc1, it seems only >> 8.0.0.0rc1 provide the "complete" monasca functionality, it that >> right(that means you can see monasca's plugin in openstack horizon, >> as the following picture)? >> > You are correct that Monasca is supported from the Stein release > onwards. Due to a number of people asking we have created a backport > to Rocky, but the patches are not merged yet. Please see this bug for > a link to the patch chains: > https://bugs.launchpad.net/kolla-ansible/+bug/1824982 > > The horizon-ui-plugin isn't currently installed in the Horizon image, > but I can easily add a patch for it. > > Similar functionality is currently provided by the monasca-grafana > fork (which provides Keystone integration), for example: > Apologies, the images were stripped, please see this link: https://github.com/monasca/monasca-grafana > > Menu > > Overview > >> Thank you very much. >> >> Regards, >> >> Shawn >> >> monasca.png >> >> >> Doug Szumski > 於 >> 2019年5月2日 週四 下午4:21寫道: >> >> >>     On 01/05/2019 08:45, Ming-Che Liu wrote: >>     > Hello, >>     > >>     > I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. >> >>     It doesn't look like Monasca is enabled in your globals.yml file. >> Are >>     you trying to set up OpenStack services first and then enable >> Monasca >>     afterwards? You can also deploy Monasca standalone if that is >> useful: >> >> https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/monasca-guide.html >> >>     > >>     > I follow the steps as mentioned in >>     > >> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >>     > >>     > The setting in my computer's globals.yml as same as [Quick Start] >>     > tutorial (attached file: globals.yml is my setting). >>     > >>     > My machine environment as following: >>     > OS: Ubuntu 16.04 >>     > Kolla-ansible verions: 8.0.0.0rc1 >>     > ansible version: 2.7 >>     > >>     > When I execute [bootstrap-servers] and [prechecks], it seems ok >> (no >>     > fatal error or any interrupt). >>     > >>     > But when I execute [deploy], it will occur some error about >>     > rabbitmq(when I set enable_rabbitmq:yes) and nova compute >>     service(when >>     > I set  enable_rabbitmq:no). >>     > >>     > I have some detail screenshot about the errors as attached files, >>     > could you please help me to solve this problem? >> >>     Please can you post more information on why the containers are not >>     starting. >> >>     - Inspect rabbit and nova-compute logs (in >>     /var/lib/docker/volumes/kolla_logs/_data/) >> >>     - Check relevant containers are running, and if they are restarting >>     check the output. Eg. docker logs --follow nova_compute >> >>     > >>     > Thank you very much. >>     > >>     > [Attached file description]: >>     > globals.yml: my computer's setting about kolla-ansible >>     > >>     > As mentioned above, the following pictures show the errors, the >>     > rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova >>     > compute service error will occur if I set [enable_rabbitmq:no]. >>     > docker-version.png >>     > kolla-ansible-version.png >>     > nova-compute-service-error.png >>     > rabbitmq_error.png >> From doka.ua at gmx.com Thu May 2 10:27:36 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Thu, 2 May 2019 13:27:36 +0300 Subject: [octavia] Error while creating amphora Message-ID: Dear colleagues, I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes ERROR octavia.controller.worker.tasks.compute_tasks WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' Any advises where is the problem? My environment: - Openstack Rocky - Ubuntu 18.04 - Octavia installed in virtualenv using pip install: # pip list |grep octavia octavia 4.0.0 octavia-lib 1.1.1 python-octaviaclient 1.8.0 Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Thu May 2 11:03:29 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Thu, 2 May 2019 05:03:29 -0600 Subject: [requirements][qa][all] mock 3.0.0 released References: Message-ID: <4E536AA4-479B-4A84-A1D2-91FF8FAD122C@doughellmann.com> There's a major version bump of one of our testing dependencies, so watch for new unit test job failures. Doug > Begin forwarded message: > > From: Chris Withers > Subject: [TIP] mock 3.0.0 released > Date: May 2, 2019 at 2:07:34 AM MDT > To: "testing-in-python at lists.idyll.org" , Python List > > Hi All, > > I'm pleased to announce the release of mock 3.0.0: > https://pypi.org/project/mock/ > > This brings to rolling backport up to date with cpython master. > > It's been a few years since the last release, so I'd be surprised if there weren't some problems. > If you hit any issues, please pin to mock<3 and then: > > - If your issue relates to mock functionality, please report in the python tracker: https://bugs.python.org > > - If your issue is specific to the backport, please report here: https://github.com/testing-cabal/mock/issues > > If you're unsure, go for the second one and we'll figure it out. > > cheers, > > Chris > > _______________________________________________ > testing-in-python mailing list > testing-in-python at lists.idyll.org > http://lists.idyll.org/listinfo/testing-in-python -------------- next part -------------- An HTML attachment was scrubbed... URL: From florian.engelmann at everyware.ch Thu May 2 11:56:20 2019 From: florian.engelmann at everyware.ch (Florian Engelmann) Date: Thu, 2 May 2019 13:56:20 +0200 Subject: [nova usage] openstack usage CLI output differs from horizon output Message-ID: Hi, as far as I understood Horizon overview usage should give the same numbers as openstack usage show --project --start 2019-03-01 --end 2019-03-31 But in our deployment (rocky) the Horizon numbers are higher (~15%). Any idea why? Could be a bug? All the best, Flo -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5230 bytes Desc: not available URL: From doka.ua at gmx.com Thu May 2 12:42:10 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Thu, 2 May 2019 15:42:10 +0300 Subject: [octavia] anchor discountinued? Message-ID: <78073332-bb86-b00b-6aaf-8e309cbcd160@gmx.com> Dear colleagues, it seems Anchor, which is used by Octavia as PKI system, is discontinued. Is there replacement for Anchor which can be used with Octavia? Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From saphi070 at gmail.com Thu May 2 12:43:43 2019 From: saphi070 at gmail.com (Sa Pham) Date: Thu, 2 May 2019 21:43:43 +0900 Subject: [octavia] anchor discountinued? In-Reply-To: <78073332-bb86-b00b-6aaf-8e309cbcd160@gmx.com> References: <78073332-bb86-b00b-6aaf-8e309cbcd160@gmx.com> Message-ID: Hi Volodymyr, You mean SSL Certificate for Octavia, You can use Barbican. On Thu, May 2, 2019 at 9:42 PM Volodymyr Litovka wrote: > Dear colleagues, > > it seems Anchor, which is used by Octavia as PKI system, is > discontinued. Is there replacement for Anchor which can be used with > Octavia? > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > > -- Sa Pham Dang Master Student - Soongsil University Kakaotalk: sapd95 Skype: great_bn -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Thu May 2 12:45:10 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 2 May 2019 14:45:10 +0200 Subject: Which are the most popular free open source OpenStack cloud operating systems or distros? In-Reply-To: <762494EA-9BC6-45E1-A75C-5D0DAC488DE3@gmail.com> References: <762494EA-9BC6-45E1-A75C-5D0DAC488DE3@gmail.com> Message-ID: <9328228c-e244-e695-84b6-73b4eaf86c41@debian.org> On 5/2/19 10:20 AM, Bernd Bausch wrote: > I am not aware of a popularity ranking, but the usual commercial Linux > vendors and a large number of other providers offer distros. > See https://www.openstack.org/marketplace/distros/ for a list. Download > and documentation details are available at the vendors’ web sites. Since > OpenStack is open-source, so are the distros. > > In addition, you find non-commercial deployment tools on the > documentation web site https://docs.openstack.org/stein/deploy/. It's not non-commercial list, it's openstack-community-maintained list. For example, my own tool [1] isn't listed despite [2]. Cheers, Thomas Goirand (zigo) [1] https://salsa.debian.org/openstack-team/debian/openstack-cluster-installer [2] https://review.opendev.org/618111 From doka.ua at gmx.com Thu May 2 13:03:54 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Thu, 2 May 2019 16:03:54 +0300 Subject: [octavia] anchor discountinued? In-Reply-To: References: <78073332-bb86-b00b-6aaf-8e309cbcd160@gmx.com> Message-ID: <53e59d83-03e2-a333-1277-03d02ba2120d@gmx.com> Hi Sa, as far as I understand, Octavia uses Barbican for storing certs for TLS offload. While Anchor used for signing certs/keys when doing provisioning of Amphoraes. On 5/2/19 3:43 PM, Sa Pham wrote: > Hi Volodymyr, > > You mean SSL Certificate for Octavia, You can use Barbican. > > > > On Thu, May 2, 2019 at 9:42 PM Volodymyr Litovka > wrote: > > Dear colleagues, > > it seems Anchor, which is used by Octavia as PKI system, is > discontinued. Is there replacement for Anchor which can be used with > Octavia? > > Thank you. > > -- > Volodymyr Litovka >    "Vision without Execution is Hallucination." -- Thomas Edison > > > > > -- > Sa Pham Dang > Master Student - Soongsil University > Kakaotalk: sapd95 > Skype: great_bn > > -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From lajos.katona at ericsson.com Thu May 2 13:35:39 2019 From: lajos.katona at ericsson.com (Lajos Katona) Date: Thu, 2 May 2019 13:35:39 +0000 Subject: [openstack-dev] [neutron] PTG agenda In-Reply-To: References: Message-ID: <1335a531-9e74-d900-07a9-a6aa4ce285f4@ericsson.com> Hi Miguel, Just a note, the pad is not on the "official" list of pads here: https://wiki.openstack.org/wiki/Forum/Denver2019 Regards Lajos On 2019. 04. 29. 16:46, Miguel Lavalle wrote: > Hi Neutrinos,, > > I took your proposals for PTG topics and organized them in an agenda. > Please look at > https://etherpad.openstack.org/p/openstack-networking-train-ptg. Let's > have a very productive meeting! > > Best regards > > Miguel From lajos.katona at ericsson.com Thu May 2 13:41:11 2019 From: lajos.katona at ericsson.com (Lajos Katona) Date: Thu, 2 May 2019 13:41:11 +0000 Subject: [openstack-dev] [neutron] PTG agenda In-Reply-To: <1335a531-9e74-d900-07a9-a6aa4ce285f4@ericsson.com> References: <1335a531-9e74-d900-07a9-a6aa4ce285f4@ericsson.com> Message-ID: Sorry, This is the PTG page: https://wiki.openstack.org/wiki/PTG/Train/Etherpads and of course neutron is there..... On 2019. 05. 02. 7:35, Lajos Katona wrote: > Hi Miguel, > > Just a note, the pad is not on the "official" list of pads here: > https://wiki.openstack.org/wiki/Forum/Denver2019 > > Regards > Lajos > > On 2019. 04. 29. 16:46, Miguel Lavalle wrote: >> Hi Neutrinos,, >> >> I took your proposals for PTG topics and organized them in an agenda. >> Please look at >> https://etherpad.openstack.org/p/openstack-networking-train-ptg. >> Let's have a very productive meeting! >> >> Best regards >> >> Miguel > From Tim.Bell at cern.ch Thu May 2 13:49:28 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Thu, 2 May 2019 13:49:28 +0000 Subject: Which are the most popular free open source OpenStack cloud operating systems or distros? In-Reply-To: References: Message-ID: The OpenStack community takes part in an annual survey which can be useful for this sort of information. Details of 2018 report are at https://www.openstack.org/user-survey/2018-user-survey-report/ The 2019 user survey is also now open so you can create/update your install details at https://www.openstack.org/user-survey/survey-2019/landing Tim -----Original Message----- From: Turritopsis Dohrnii Teo En Ming Date: Thursday, 2 May 2019 at 02:09 To: "openstack-discuss at lists.openstack.org" Cc: Turritopsis Dohrnii Teo En Ming Subject: Which are the most popular free open source OpenStack cloud operating systems or distros? Subject/Topic: Which are the most popular free open source OpenStack cloud operating systems or distros? Good afternoon from Singapore, First of all, I am very new to OpenStack. May I know which are the most popular free open source OpenStack cloud operating systems or distros? How do I download, install and deploy these OpenStack distros as private cloud, public cloud or hybrid cloud? Where can I find good and detailed documentation? Thank you very much for your advice. -----BEGIN EMAIL SIGNATURE----- The Gospel for all Targeted Individuals (TIs): [The New York Times] Microwave Weapons Are Prime Suspect in Ills of U.S. Embassy Workers Link: https://www.nytimes.com/2018/09/01/science/sonic-attack-cuba-microwave.html ******************************************************************************************** Singaporean Mr. Turritopsis Dohrnii Teo En Ming's Academic Qualifications as at 14 Feb 2019 [1] https://tdtemcerts.wordpress.com/ [2] https://tdtemcerts.blogspot.sg/ [3] https://www.scribd.com/user/270125049/Teo-En-Ming -----END EMAIL SIGNATURE----- From openstack at hauschild.it Thu May 2 14:11:18 2019 From: openstack at hauschild.it (Hartwig Hauschild) Date: Thu, 2 May 2019 16:11:18 +0200 Subject: properly sizing openstack controlplane infrastructure In-Reply-To: <1A3C52DFCD06494D8528644858247BF01C30E5B5@EX10MBOX03.pnnl.gov> References: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> <1A3C52DFCD06494D8528644858247BF01C30E5B5@EX10MBOX03.pnnl.gov> Message-ID: <20190502141117.ukowjeuwqxmwphsv@alle-irre.de> Am 30.04.2019 schrieb Fox, Kevin M: > I've run that same network config at about 70 nodes with no problems. I've run the same without dvr at 150 nodes. > > Your memory usage seems very high. I ran 150 nodes with a small 16g server ages ago. Might double check that. > That's what I was thinking as well, but it did not match up with what we currently have at all. I'll need to figure out what went wrong here. -- Cheers, Hardy From strigazi at gmail.com Thu May 2 14:11:22 2019 From: strigazi at gmail.com (Spyros Trigazis) Date: Thu, 2 May 2019 08:11:22 -0600 Subject: [magnm][ptg] Room for magnum this afternoon Message-ID: Hello everyone, Magnum will have two PTG sessions this afternoon [0] in Room 112. Note that magnum's track is not in the printed scheduled you have taken from the registration desk. You can join remotely in this etherpad [1]. Cheers, Spyros [0] http://ptg.openstack.org/ptg.html [1] https://etherpad.openstack.org/p/magnum-train-ptg -------------- next part -------------- An HTML attachment was scrubbed... URL: From strigazi at gmail.com Thu May 2 14:13:03 2019 From: strigazi at gmail.com (Spyros Trigazis) Date: Thu, 2 May 2019 08:13:03 -0600 Subject: [magnum][ptg] Room for magnum this afternoon In-Reply-To: References: Message-ID: I did a typo in the subject. Cheers, Spyros On Thu, May 2, 2019 at 8:11 AM Spyros Trigazis wrote: > Hello everyone, > > Magnum will have two PTG sessions this afternoon [0] in Room 112. > Note that magnum's track is not in the printed scheduled you have taken > from the registration desk. > > You can join remotely in this etherpad [1]. > > Cheers, > Spyros > > [0] http://ptg.openstack.org/ptg.html > [1] https://etherpad.openstack.org/p/magnum-train-ptg > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at hauschild.it Thu May 2 14:15:17 2019 From: openstack at hauschild.it (Hartwig Hauschild) Date: Thu, 2 May 2019 16:15:17 +0200 Subject: properly sizing openstack controlplane infrastructure In-Reply-To: <2114088542.1037659.1556641040235.JavaMail.zimbra@speichert.pl> References: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> <2114088542.1037659.1556641040235.JavaMail.zimbra@speichert.pl> Message-ID: <20190502141517.muvcrkh6wej3i7wo@alle-irre.de> Am 30.04.2019 schrieb Daniel Speichert: > ----- Original Message ----- > > From: "Hartwig Hauschild" > > To: openstack-discuss at lists.openstack.org > > Sent: Tuesday, April 30, 2019 9:30:22 AM > > Subject: properly sizing openstack controlplane infrastructure > > > The requirements we've got are basically "here's 50 compute-nodes, make sure > > whatever you're building scales upwards from there". > > It depends what's your end goal. 100? 500? >1000 nodes? > At some point things like Nova Cells will help (or become necessity). I really hope not that high, but splitting into cells or AZs / Regions is definitely planned if it goes up. > > The pike-stack has three servers as control-plane, each of them with 96G of > > RAM and they don't seem to have too much room left when coordinating 14 > > compute-nodes. > > 96 GB of RAM per controller is much more than enough for 14 compute nodes. > There's room for improvement in configuration. > > > We're thinking about splitting the control-nodes into infrastructure > > (db/rabbit/memcache) and API. > > > > What would I want to look for when sizing those control-nodes? I've not been > > able to find any references for this at all, just rather nebulous '8G RAM > > should do' which is around what our rabbit currently inhales. > > You might want to check out Performance Docs: > https://docs.openstack.org/developer/performance-docs/ > > For configuration tips, I'd suggest looking at what openstack-ansible > or similar projects provide as "battle-tested" configuration. > It's a good baseline reference before you tune yourself. > Problem is: For all I know this is a non-tuned openstack-ansible-setup. I guess I'll have to figure out why it's using way more memory than it should (and run out every now and then). Thanks, -- cheers, Hardy From arbermejo0417 at gmail.com Thu May 2 14:23:07 2019 From: arbermejo0417 at gmail.com (Alejandro Ruiz Bermejo) Date: Thu, 2 May 2019 10:23:07 -0400 Subject: [ETCD] client: etcd member http://controller:2379 has no leader Message-ID: Hi, i'm installing Zun in Openstack Queens with Ubuntu 18.04.1 LTS, i already have configured docker and kuyr-libnetwork. I'm following the guide at https://docs.openstack.org/zun/queens/install/index.html. I followed all the steps of the installation at controller node and everything resulted without problems. After finished the installation direction at compute node the *systemctl status zun-compute* have the following errors root at compute /h/team# systemctl status zun-compute ● zun-compute.service - OpenStack Container Service Compute Agent Loaded: loaded (/etc/systemd/system/zun-compute.service; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-04-30 16:46:56 UTC; 4h 26min ago Main PID: 2072 (zun-compute) Tasks: 1 (limit: 4915) CGroup: /system.slice/zun-compute.service └─2072 /usr/bin/python /usr/local/bin/zun-compute Apr 30 16:46:56 compute systemd[1]: Started OpenStack Container Service Compute Agent. Apr 30 16:46:57 compute zun-compute[2072]: 2019-04-30 16:46:57.929 2072 INFO zun.cmd.compute [-] Starting server in PID 2072 Apr 30 16:46:57 compute zun-compute[2072]: 2019-04-30 16:46:57.941 2072 INFO zun.container.driver [-] Loading container driver 'docker.driver.DockerDriver' Apr 30 16:46:58 compute zun-compute[2072]: 2019-04-30 16:46:58.028 2072 INFO zun.container.driver [-] Loading container driver 'docker.driver.DockerDriver' Apr 30 16:48:33 compute zun-compute[2072]: 2019-04-30 16:48:33.645 2072 INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - -] Loading container image driver 'glance' Apr 30 16:48:33 compute zun-compute[2072]: 2019-04-30 16:48:33.911 2072 INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - -] Loading container image driver 'glance' Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.455 2072 INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - -] Loading container image driver 'glance' Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.939 2072 ERROR zun.image.glance.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - -] Imae cirros was not found in glance: ImageNotFound: Image cirros could not be found. Apr 30 16:48:35 compute zun-compute[2072]: 2019-04-30 16:48:35.940 2072 INFO zun.image.driver [req-7e0b8325-1e09-4410-80f4-af807cbc0420 a16c6ef0319b4643a4ec8e56a1d025cb 59065d8f970b467aa94ef7b35f1edab5 default - -] Loading container image driver 'docker' Apr 30 16:48:55 compute zun-compute[2072]: 2019-04-30 16:48:55.011 2072 ERROR zun.compute.manager [req-7bfa764a-45b8-4e2f-ac70-84d8bb71b135 - - - - -] Error occurred while calling Docker create API: Docker internal error: 500 Server Error: Internal Server Error ("failed to update store for object typpe *libnetwork.endpointCnt: client: etcd member http://controller:2379 has no leader").: DockerError: Docker internal error: 500 Server Error: Internal Server Error ("failed to update store for object type *libnetwork.endpointtCnt: client: etcd member http://controller:2379 has no leader"). Also *systemctl status docker* show the next output root at compute /h/team# systemctl status docker ● docker.service - Docker Application Container Engine Loaded: loaded (/lib/systemd/system/docker.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/docker.service.d └─docker.conf, http-proxy.conf, https-proxy.conf Active: active (running) since Tue 2019-04-30 16:46:25 UTC; 4h 18min ago Docs: https://docs.docker.com Main PID: 1777 (dockerd) Tasks: 21 CGroup: /system.slice/docker.service └─1777 /usr/bin/dockerd --group zun -H tcp://compute:2375 -H unix:///var/run/docker.sock --cluster-store etcd://controller:2379 Apr 30 16:46:20 compute dockerd[1777]: time="2019-04-30T16:46:20.815305836Z" level=warning msg="Your kernel does not support cgroup rt runtime" Apr 30 16:46:20 compute dockerd[1777]: time="2019-04-30T16:46:20.815933695Z" level=info msg="Loading containers: start." Apr 30 16:46:24 compute dockerd[1777]: time="2019-04-30T16:46:24.378526837Z" level=info msg="Default bridge (docker0) is assigned with an IP address 17 Apr 30 16:46:24 compute dockerd[1777]: time="2019-04-30T16:46:24.572558877Z" level=info msg="Loading containers: done." Apr 30 16:46:25 compute dockerd[1777]: time="2019-04-30T16:46:25.198101219Z" level=info msg="Docker daemon" commit=e8ff056 graphdriver(s)=overlay2 vers Apr 30 16:46:25 compute dockerd[1777]: time="2019-04-30T16:46:25.198211373Z" level=info msg="Daemon has completed initialization" Apr 30 16:46:25 compute dockerd[1777]: time="2019-04-30T16:46:25.232286069Z" level=info msg="API listen on /var/run/docker.sock" Apr 30 16:46:25 compute dockerd[1777]: time="2019-04-30T16:46:25.232318790Z" level=info msg="API listen on 10.8.9.58:2375" Apr 30 16:46:25 compute systemd[1]: Started Docker Application Container Engine. Apr 30 16:48:55 compute dockerd[1777]: time="2019-04-30T16:48:55.009820439Z" level=error msg="Handler for POST /v1.26/networks/create returned error: failed to update store for object type *libnetwork.endpointCnt: client: etcd member http://controller:2379 has no leader" When i try to launch an app container as the guide says it shows an Error state and when i run opentack appcontainer show this is the reason of the error status_reason | Docker internal error: 500 Server Error: Internal Server Error ("failed to update store for object type *libnetwork.endpointCnt: client: etcd member http://controller:2379 has no leader") -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at hauschild.it Thu May 2 14:21:32 2019 From: openstack at hauschild.it (Hartwig Hauschild) Date: Thu, 2 May 2019 16:21:32 +0200 Subject: properly sizing openstack controlplane infrastructure In-Reply-To: <6448907c-6aaf-2f91-fe77-48e697c7b80f@debian.org> References: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> <6448907c-6aaf-2f91-fe77-48e697c7b80f@debian.org> Message-ID: <20190502142131.llh7udkpgyhncb4d@alle-irre.de> Am 01.05.2019 schrieb Thomas Goirand: > On 4/30/19 5:30 PM, Hartwig Hauschild wrote: > > Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. > > Does that properly scale up and above 50+ nodes > > It does, that's not the bottleneck. > Oh, Ok. I've read that OVS-DVR-VXLAN will produce a lot of load on the messaging-system, at least if you enable l2-pop and don't run broadcast. > From my experience, 3 heavy control nodes are really enough to handle > 200+ compute nodes. Though what you're suggesting (separating db & > rabbitmq-server in separate nodes) is a very good idea. > Ah, cool. Then I'll head that way and see how that works out (and how many add-on-services it can take) -- cheers, Hardy From johnsomor at gmail.com Thu May 2 14:44:38 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 2 May 2019 08:44:38 -0600 Subject: [octavia] anchor discountinued? In-Reply-To: <53e59d83-03e2-a333-1277-03d02ba2120d@gmx.com> References: <78073332-bb86-b00b-6aaf-8e309cbcd160@gmx.com> <53e59d83-03e2-a333-1277-03d02ba2120d@gmx.com> Message-ID: Volodymyr, Correct, Anchor is no longer an OpenStack project and we need to remove the reference to it in our code. Currently there is not another option beyond the built in "local_cert_generator" for this function. Michael On Thu, May 2, 2019 at 7:05 AM Volodymyr Litovka wrote: > > Hi Sa, > > as far as I understand, Octavia uses Barbican for storing certs for TLS offload. While Anchor used for signing certs/keys when doing provisioning of Amphoraes. > > On 5/2/19 3:43 PM, Sa Pham wrote: > > Hi Volodymyr, > > You mean SSL Certificate for Octavia, You can use Barbican. > > > > On Thu, May 2, 2019 at 9:42 PM Volodymyr Litovka wrote: >> >> Dear colleagues, >> >> it seems Anchor, which is used by Octavia as PKI system, is >> discontinued. Is there replacement for Anchor which can be used with >> Octavia? >> >> Thank you. >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> > > > -- > Sa Pham Dang > Master Student - Soongsil University > Kakaotalk: sapd95 > Skype: great_bn > > > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison From johnsomor at gmail.com Thu May 2 14:58:34 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Thu, 2 May 2019 08:58:34 -0600 Subject: [octavia] Error while creating amphora In-Reply-To: References: Message-ID: Volodymyr, It looks like you have enabled "user_data_config_drive" in the octavia.conf file. Is there a reason you need this? If not, please set it to False and it will resolve your issue. It appears we have a python3 bug in the "user_data_config_drive" capability. It is not generally used and appears to be missing test coverage. I have opened a story (bug) on your behalf here: https://storyboard.openstack.org/#!/story/2005553 Michael On Thu, May 2, 2019 at 4:29 AM Volodymyr Litovka wrote: > > Dear colleagues, > > I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: > > DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 > ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes > ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute > ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config > ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render > ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render > ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception > ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise > ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code > ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} > ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent > ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method > ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes > ERROR octavia.controller.worker.tasks.compute_tasks > WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' > > Any advises where is the problem? > > My environment: > - Openstack Rocky > - Ubuntu 18.04 > - Octavia installed in virtualenv using pip install: > # pip list |grep octavia > octavia 4.0.0 > octavia-lib 1.1.1 > python-octaviaclient 1.8.0 > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison From jacob.anders.au at gmail.com Thu May 2 15:18:53 2019 From: jacob.anders.au at gmail.com (Jacob Anders) Date: Fri, 3 May 2019 01:18:53 +1000 Subject: [baremetal-sig][ironic][ptg] Bare-metal whitepaper meeting at PTG Message-ID: Hi All, As discussed in the forum session earlier in the week, I would like to put together a session at the PTG for the Bare-metal SIG members to discuss the Bare Metal Whitepaper work and plan out next steps. Ironic schedule for the PTG is pretty tight but how about 4pm on the Friday? We could do this as a breakout in the main Ironic session. Who would be interested/available? Thanks, cheers, Jacob -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Thu May 2 15:25:29 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 2 May 2019 09:25:29 -0600 Subject: [openstack-dev] [neutron] PTG agenda In-Reply-To: References: <1335a531-9e74-d900-07a9-a6aa4ce285f4@ericsson.com> Message-ID: Awesome, thanks for making sure everything is in working order :-) On Thu, May 2, 2019 at 7:41 AM Lajos Katona wrote: > Sorry, > > This is the PTG page: > https://wiki.openstack.org/wiki/PTG/Train/Etherpads > and of course neutron is there..... > > On 2019. 05. 02. 7:35, Lajos Katona wrote: > > Hi Miguel, > > > > Just a note, the pad is not on the "official" list of pads here: > > https://wiki.openstack.org/wiki/Forum/Denver2019 > > > > Regards > > Lajos > > > > On 2019. 04. 29. 16:46, Miguel Lavalle wrote: > >> Hi Neutrinos,, > >> > >> I took your proposals for PTG topics and organized them in an agenda. > >> Please look at > >> https://etherpad.openstack.org/p/openstack-networking-train-ptg. > >> Let's have a very productive meeting! > >> > >> Best regards > >> > >> Miguel > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pawel.konczalski at everyware.ch Thu May 2 15:48:22 2019 From: pawel.konczalski at everyware.ch (Pawel Konczalski) Date: Thu, 2 May 2019 17:48:22 +0200 Subject: kube_cluster_deploy fails In-Reply-To: <497c1efd-a2af-6958-7e11-ae8e38eb4df9@everyware.ch> References: <0f00a092-1f7d-e85b-9ce4-da38cfd2c9da@everyware.ch> <5080c19a-3c98-8c13-eec1-49706d3e591c@everyware.ch> <497c1efd-a2af-6958-7e11-ae8e38eb4df9@everyware.ch> Message-ID: <42689a26-358f-09b9-6dfe-5a5b57b916ba@everyware.ch> Also you have to ensure that swap is disabled in the Kubernetes master and minion flavor(s). Following commands should result in a working Kubernetes deploy process: # Create image for Kubernetes VMs wget https://download.fedoraproject.org/pub/alt/atomic/stable/Fedora-29-updates-20190429.0/AtomicHost/x86_64/images/Fedora-AtomicHost-29-20190429.0.x86_64.raw.xz xz -d Fedora-AtomicHost-29-20190429.0.x86_64.raw.xz openstack image create "Fedora AtomicHost 29" \   --file Fedora-AtomicHost-29-20190429.0.x86_64.raw \   --disk-format raw \   --container-format=bare \   --min-disk 10 \   --min-ram 4096 \   --public \   --protected \   --property os_distro=fedora-atomic \   --property os_admin_user=fedora \   --property os_version="20190429.0" # Create flavor for Kubernetes cluster openstack flavor create m1.kubernetes \   --disk 40 \   --vcpu 2 \   --ram 4096 \   --public # Create Kubernetes template openstack coe cluster template create kubernetes-cluster-template \   --image "Fedora AtomicHost 29" \   --external-network public \   --dns-nameserver 8.8.8.8 \   --master-flavor m1.kubernetes \   --flavor m1.kubernetes \   --coe kubernetes \   --volume-driver cinder \   --network-driver flannel \   --docker-volume-size 40 # Create Kubernetes cluster openstack coe cluster create kubernetes-cluster \   --cluster-template kubernetes-cluster-template \   --master-count 1 \   --node-count 2 \   --keypair mykey BR Pawel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: not available URL: From daniel at speichert.pl Thu May 2 16:02:46 2019 From: daniel at speichert.pl (Daniel Speichert) Date: Thu, 2 May 2019 18:02:46 +0200 (CEST) Subject: [self-healing] live-migrate instance in response to fault signals In-Reply-To: References: Message-ID: <1640608910.1064843.1556812966934.JavaMail.zimbra@speichert.pl> ----- Original Message ----- > From: "Eric K" > To: "openstack-discuss" > Sent: Wednesday, May 1, 2019 4:59:57 PM > Subject: [self-healing] live-migrate instance in response to fault signals ... > > I just want to follow up to get more info on the context; > specifically, which of the following pieces are the main difficulties? > - detecting the failure/soft-fail/early failure indication > - codifying how to respond to each failure scenario > - triggering/executing the desired workflow > - something else > > [1] https://etherpad.openstack.org/p/DEN-self-healing-SIG We currently attempt to do all of the above using less-than-optimal custom scripts (using openstacksdk) and pipelines (running Ansible). I think there is tremendous value in developing at least one tested way to do all of the above by connecting e.g. Monasca, Mistral and Nova together to do the above. Maybe it's currently somewhat possible - then it's more of a documentation issue that would benefit operators. One of the derivative issues is the quality of live-migration in Nova. (I don't have production-level experience with Rocky/Stein yet.) When we do lots of live migrations, there is obviously a limit on the number of live migrations happening at the same time (doing more would be counter productive). These limits could be smarter/more dynamic in some cases. There is no immediate action item here right now though. I would like to begin with putting together all the pieces that currently work together and go from there - see what's missing. -Daniel From mriedemos at gmail.com Thu May 2 16:11:02 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 2 May 2019 10:11:02 -0600 Subject: [forum][sdk][nova] Closing compute API feature gaps in the openstack CLI - recap Message-ID: <799f4669-5c92-5cd5-f8ee-4e9a8baae35a@gmail.com> I wanted to give a quick recap of this Forum session for those that couldn't attend and also find owners. Please reply to this thread if you'd like to sign up for any specific item. The etherpad [1] has the details. To restate the goal: "Identify the major issues and functional gaps (up through Mitaka 2.25) and prioritize which to work on and what the commands should do." We spent the majority of the time talking about existing issues with compute API functionality in openstack CLI, primarily boot-from-volume, live migration and lack of evacuate support (evacuate as in rebuild on a new target host because the source host is dead, not drain a host with live migrations [2]). We then talked through some of the microversion gaps and picked a few to focus on. Based on that, the agreements and priorities are: **High Priority** 1. Make the boot-from-volume experience better by: a) Support type=image for the --block-device-mapping option. b) Add a --boot-from-volume option which will translate to a root --block-device-mapping using the provided --image value (nova will create the root volume under the covers). Owner: TBD (on either) 2. Fix the "openstack server migrate" command We're going to deprecate the --live option and add a new --live-migration option and a --host option. The --host option can be used for requesting a target host for cold migration (omit the --live/--live-migration option for that). Then in a major release we'll drop the --live option and intentionally not add a --force option (since we don't want to support forcing a target host and bypassing the scheduler). Owner: TBD (I would split the 2.56 cold migration --host support from the new --live-migration option review-wise) **Medium Priority** Start modeling migration resources in the openstack CLI, specifically for microversions 2.22-2.24, but note that the GET /os-migrations API is available since 2.1 (so that's probably easiest to add first). The idea is to have a new command resource like: openstack compute migration (list|delete|set) [--server ] Owner: TBD (again this is a series of changes) **Low Priority** Closing other feature gaps can be done on an as-needed basis as we've been doing today. Sean Mooney is working on adding evacuate support, and there are patches in flight (see [3]) for other microversion-specific features. I would like to figure out how to highlight these to the OSC core team on a more regular basis, but we didn't really talk about that. I've been trying to be a type of liaison for these patches and go over them before the core team tries to review them to make sure they match the API properly and are well documented. Does the OSC core team have any suggestions on how I can better socialize what I think is ready for core team review? [1] https://etherpad.openstack.org/p/DEN-osc-compute-api-gaps [2] http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/ [3] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc -- Thanks, Matt From zigo at debian.org Thu May 2 16:31:45 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 2 May 2019 18:31:45 +0200 Subject: properly sizing openstack controlplane infrastructure In-Reply-To: <20190502142131.llh7udkpgyhncb4d@alle-irre.de> References: <20190430153021.jhdgri7g2nvpn5vj@alle-irre.de> <6448907c-6aaf-2f91-fe77-48e697c7b80f@debian.org> <20190502142131.llh7udkpgyhncb4d@alle-irre.de> Message-ID: On 5/2/19 4:21 PM, Hartwig Hauschild wrote: > Am 01.05.2019 schrieb Thomas Goirand: >> On 4/30/19 5:30 PM, Hartwig Hauschild wrote: >>> Also: We're currently running Neutron in OVS-DVR-VXLAN-Configuration. >>> Does that properly scale up and above 50+ nodes >> >> It does, that's not the bottleneck. >> > Oh, Ok. I've read that OVS-DVR-VXLAN will produce a lot of load on the > messaging-system, at least if you enable l2-pop and don't run broadcast. Yes, but that's really not a big problem for a 200+ nodes setup, especially if you dedicate 3 nodes for messaging. >> From my experience, 3 heavy control nodes are really enough to handle >> 200+ compute nodes. Though what you're suggesting (separating db & >> rabbitmq-server in separate nodes) is a very good idea. >> > Ah, cool. Then I'll head that way and see how that works out (and how many > add-on-services it can take) > From tony at bakeyournoodle.com Thu May 2 16:35:23 2019 From: tony at bakeyournoodle.com (Tony Breeds) Date: Thu, 2 May 2019 10:35:23 -0600 Subject: [stable] propose Tim Burke as stable core In-Reply-To: <1F014297-E404-49B6-BE09-61F4DA478AF5@not.mn> References: <1F014297-E404-49B6-BE09-61F4DA478AF5@not.mn> Message-ID: <20190502163522.GB32106@thor.bakeyournoodle.com> On Wed, May 01, 2019 at 05:14:25PM -0600, John Dickinson wrote: > Tim has been very active in proposing and maintaining patches to > Swift’s stable branches. Of recent (non-automated) backports, Tim has > proposed more than a third of them. Done. Given the smaller number of backports I've judged Tim's understanding of the stable policy from those rather than reviews. Yours Tony. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From mthode at mthode.org Thu May 2 16:37:56 2019 From: mthode at mthode.org (Matthew Thode) Date: Thu, 2 May 2019 11:37:56 -0500 Subject: [requirements][qa][all] mock 3.0.0 released In-Reply-To: <4E536AA4-479B-4A84-A1D2-91FF8FAD122C@doughellmann.com> References: <4E536AA4-479B-4A84-A1D2-91FF8FAD122C@doughellmann.com> Message-ID: <20190502163756.srtgbyabxwj3ewvh@mthode.org> On 19-05-02 05:03:29, Doug Hellmann wrote: > There's a major version bump of one of our testing dependencies, so watch for new unit test job failures. > > Doug > > > Begin forwarded message: > > > > From: Chris Withers > > Subject: [TIP] mock 3.0.0 released > > Date: May 2, 2019 at 2:07:34 AM MDT > > To: "testing-in-python at lists.idyll.org" , Python List > > > > Hi All, > > > > I'm pleased to announce the release of mock 3.0.0: > > https://pypi.org/project/mock/ > > > > This brings to rolling backport up to date with cpython master. > > > > It's been a few years since the last release, so I'd be surprised if there weren't some problems. > > If you hit any issues, please pin to mock<3 and then: > > > > - If your issue relates to mock functionality, please report in the python tracker: https://bugs.python.org > > > > - If your issue is specific to the backport, please report here: https://github.com/testing-cabal/mock/issues > > > > If you're unsure, go for the second one and we'll figure it out. > > Ack, thanks for the notice -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From miguel at mlavalle.com Thu May 2 16:57:22 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Thu, 2 May 2019 10:57:22 -0600 Subject: [openstack-dev] [neutron] Team picture Message-ID: Dear Neutrinos, Please remember that we will have out team picture taken at 11:50, NEXT TO THE PTG REGISTRATION DESK. Please be there on time Best regards Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Thu May 2 17:12:44 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 2 May 2019 11:12:44 -0600 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. In-Reply-To: References: Message-ID: On Wed, 1 May 2019 at 17:10, Ming-Che Liu wrote: > Hello, > > I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. > > I follow the steps as mentioned in > https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html > > The setting in my computer's globals.yml as same as [Quick Start] tutorial > (attached file: globals.yml is my setting). > > My machine environment as following: > OS: Ubuntu 16.04 > Kolla-ansible verions: 8.0.0.0rc1 > ansible version: 2.7 > > When I execute [bootstrap-servers] and [prechecks], it seems ok (no fatal > error or any interrupt). > > But when I execute [deploy], it will occur some error about rabbitmq(when > I set enable_rabbitmq:yes) and nova compute service(when I > set enable_rabbitmq:no). > > I have some detail screenshot about the errors as attached files, could > you please help me to solve this problem? > > Thank you very much. > > [Attached file description]: > globals.yml: my computer's setting about kolla-ansible > > As mentioned above, the following pictures show the errors, the rabbitmq > error will occur if I set [enable_rabbitmq:yes], the nova compute service > error will occur if I set [enable_rabbitmq:no]. > Hi Ming-Che, Since Stein, we no longer test Kolla Ansible with Ubuntu 16.04 upstream. Could you try again using Ubuntu 18.04? Regards, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Thu May 2 19:18:00 2019 From: openstack at nemebean.com (Ben Nemec) Date: Thu, 2 May 2019 13:18:00 -0600 Subject: =?UTF-8?Q?Re=3a_=5boslo=5d_Proposing_Herv=c3=a9_Beraud_for_Oslo_Cor?= =?UTF-8?Q?e?= In-Reply-To: <75a2b34b-e46c-8361-1ab3-c910c95a6ecb@nemebean.com> References: <75a2b34b-e46c-8361-1ab3-c910c95a6ecb@nemebean.com> Message-ID: <14c9d853-0e48-dce7-23bc-48623dcbd3f3@nemebean.com> There were no objections and it's been a week, so I've added Hervé to the oslo-core team. Welcome! On 4/24/19 9:42 AM, Ben Nemec wrote: > Hi, > > Hervé has been working on Oslo for a while now and in that time has > shown tremendous growth in his understanding of Oslo and OpenStack. I > think he would make a good addition to the general Oslo core team. > Existing Oslo team members (+Keystone, Castellan, and anyone else we > co-own libraries with) please respond with +1/-1. If there are no > objections I'll add him to the ACL next week and we can celebrate in > person. :-) > > Thanks. > > -Ben > From openstack at fried.cc Thu May 2 19:31:02 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 13:31:02 -0600 Subject: [nova][ptg] Summary: Stein Retrospective Message-ID: <5b287537-9489-b10e-4d52-7a4cbb617d0a@fried.cc> Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-retrospective Summary: - At least one newcomer was very happy with the welcome/support he received. Let's keep up the encouragement and make new long-term contributors - Placement extraction went pretty well. Forum session had no negative energy. (Is this because we planned really well, or because it wasn't that big a deal to begin with?) - Great work and collaboration on the bandwidth series. Framework will set us up nicely for other uses as well (e.g. cyborg). - Runways work. Let's keep using them. - Release themes: some people benefit from their existence; for others they are irrelevant but harmless. So let's keep doing them, since they benefit some. - Good coordination & communication with the TripleO team. - Long commit chains are hard. Things that have helped some people, which should be encouraged for the future: hangouts, videos, and/or emails, supplementing specs, acting as "review guidance". - The Stein release seemed a little less focused than usual. No cause or action was identified. - Pre-PTG emails around Placement were very effective for some, suggested to be used in the future for Nova, though noting the limitations and restricting to the parts of the design process not appropriate for other forums (like specs). Actions: - do themes (to be discussed Saturday 1200) - keep doing runways - (mriedem) hangout/video/review-guide-email for cross-cell resize work - Consider pre-PTG email threads for U efried . From arbermejo0417 at gmail.com Thu May 2 19:55:23 2019 From: arbermejo0417 at gmail.com (Alejandro Ruiz Bermejo) Date: Thu, 2 May 2019 15:55:23 -0400 Subject: [Zun] openstack appcontainer run error Message-ID: I'm having troubles with the verify step of the Zun intallation at Openstack Queens on Ubuntu 18.04 LTS. I previously Posted a trouble with it and already fixed the error you guys pointed at. Now i still can't launch the app container. It freeze at container_creating task, the shows an error state root at controller /h/team# openstack appcontainer show 4a657ac5-058c-43eb-8cbf-7239ad3c4d76 +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | addresses | {} | | links | [{u'href': u' http://controller:9517/v1/containers/4a657ac5-058c-43eb-8cbf-7239ad3c4d76', u'rel': u'self'}, {u'href': u' http://controller:9517/containers/4a657ac5-058c-43eb-8cbf-7239ad3c4d76', u'rel': u'bookmark'}] | | image | cirros | | labels | {} | | disk | 0 | | security_groups | [] | | image_pull_policy | None | | user_id | a16c6ef0319b4643a4ec8e56a1d025cb | | uuid | 4a657ac5-058c-43eb-8cbf-7239ad3c4d76 | | hostname | None | | environment | {} | | memory | None | | project_id | 59065d8f970b467aa94ef7b35f1edab5 | | status | Error | | workdir | None | | auto_remove | False | | status_detail | None | | host | None | | image_driver | docker | | task_state | None | | status_reason | *Docker internal error: 500 Server Error: Internal Server Error ("failed to update store for object type *libnetwork.endpointCnt: client: endpoint http://10.8.9.54:2379 exceeded header timeout"). * | | name | test1 | | restart_policy | {} | | ports | [] | | command | "ping" "8.8.8.8" | | runtime | None | | cpu | None | | interactive | False | +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ I tried to launch another container without defining a network and without executing any command, and it also had the same error. I can launch container from the computer node cli with docker commands, the errors are when i try to launch them from the controller CLI. I run a docker run hello-world at the compute node and everything went fine Wen u runned openstack appcontainer create hello-world i had exactly the same error -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Thu May 2 21:57:49 2019 From: james.page at canonical.com (James Page) Date: Thu, 2 May 2019 15:57:49 -0600 Subject: [sig][upgrades] job done Message-ID: Hi All After this mornings PTG session, I'm pleased to propose that we formally end the Upgrades SIG. That’s a “pleased” because we feel that our job as a SIG is done! Upgrades in OpenStack are no longer a "special interest"; they are now an integral part of the philosophy of projects within the OpenStack ecosystem and although there are probably still some rough edges, we don’t think we need a SIG to drive this area forward any longer. So thanks for all of the war stories, best practice discussion and general upgrade related conversation over the Forums, PTG’s and Summits over the last few years - it's been fun! Regards James -------------- next part -------------- An HTML attachment was scrubbed... URL: From alifshit at redhat.com Thu May 2 22:02:50 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Thu, 2 May 2019 16:02:50 -0600 Subject: [forum][sdk][nova] Closing compute API feature gaps in the openstack CLI - recap In-Reply-To: <799f4669-5c92-5cd5-f8ee-4e9a8baae35a@gmail.com> References: <799f4669-5c92-5cd5-f8ee-4e9a8baae35a@gmail.com> Message-ID: On Thu, May 2, 2019 at 10:15 AM Matt Riedemann wrote: > > I wanted to give a quick recap of this Forum session for those that > couldn't attend and also find owners. Please reply to this thread if > you'd like to sign up for any specific item. The etherpad [1] has the > details. > > To restate the goal: "Identify the major issues and functional gaps (up > through Mitaka 2.25) and prioritize which to work on and what the > commands should do." > > We spent the majority of the time talking about existing issues with > compute API functionality in openstack CLI, primarily boot-from-volume, > live migration and lack of evacuate support (evacuate as in rebuild on a > new target host because the source host is dead, not drain a host with > live migrations [2]). > > We then talked through some of the microversion gaps and picked a few to > focus on. > > Based on that, the agreements and priorities are: > > **High Priority** > > 1. Make the boot-from-volume experience better by: > > a) Support type=image for the --block-device-mapping option. > > b) Add a --boot-from-volume option which will translate to a root > --block-device-mapping using the provided --image value (nova will > create the root volume under the covers). > > Owner: TBD (on either) I can take this. I'll also work on all the device tagging stuff - both tagged attach in 2.49 ([3] L122) and the original tagged boot devices in 2.32 (which I've added to [3] as a quick note). > 2. Fix the "openstack server migrate" command > > We're going to deprecate the --live option and add a new > --live-migration option and a --host option. The --host option can be > used for requesting a target host for cold migration (omit the > --live/--live-migration option for that). Then in a major release we'll > drop the --live option and intentionally not add a --force option (since > we don't want to support forcing a target host and bypassing the scheduler). > > Owner: TBD (I would split the 2.56 cold migration --host support from > the new --live-migration option review-wise) > > **Medium Priority** > > Start modeling migration resources in the openstack CLI, specifically > for microversions 2.22-2.24, but note that the GET /os-migrations API is > available since 2.1 (so that's probably easiest to add first). The idea > is to have a new command resource like: > > openstack compute migration (list|delete|set) [--server ] > > Owner: TBD (again this is a series of changes) > > **Low Priority** > > Closing other feature gaps can be done on an as-needed basis as we've > been doing today. Sean Mooney is working on adding evacuate support, and > there are patches in flight (see [3]) for other microversion-specific > features. > > I would like to figure out how to highlight these to the OSC core team > on a more regular basis, but we didn't really talk about that. I've been > trying to be a type of liaison for these patches and go over them before > the core team tries to review them to make sure they match the API > properly and are well documented. Does the OSC core team have any > suggestions on how I can better socialize what I think is ready for core > team review? > > [1] https://etherpad.openstack.org/p/DEN-osc-compute-api-gaps > [2] > http://www.danplanet.com/blog/2016/03/03/evacuate-in-nova-one-command-to-confuse-us-all/ > [3] https://etherpad.openstack.org/p/compute-api-microversion-gap-in-osc > > -- > > Thanks, > > Matt > > > -- Artom Lifshitz Software Engineer, OpenStack Compute DFG From thierry at openstack.org Thu May 2 22:03:36 2019 From: thierry at openstack.org (Thierry Carrez) Date: Fri, 3 May 2019 00:03:36 +0200 Subject: [sig][upgrades] job done In-Reply-To: References: Message-ID: James Page wrote: > After this mornings PTG session, I'm pleased to propose that we formally > end the Upgrades SIG. > > That’s a “pleased” because we feel that our job as a SIG is done! > > Upgrades in OpenStack are no longer a "special interest"; they are now > an integral part of the philosophy of projects within the OpenStack > ecosystem and although there are probably still some rough edges, we > don’t think we need a SIG to drive this area forward any longer. > > So thanks for all of the war stories, best practice discussion and > general upgrade related conversation over the Forums, PTG’s and Summits > over the last few years - it's been fun! That makes a lot of sense. Upgrades are (1) in a much better shape than a couple of years ago, and (2) are now a general concern with work happening on every team (as the upgrade-checks goal in Stein showed), so a SIG is a bit redundant. Thanks James for your help driving this ! -- Thierry Carrez (ttx) From james.page at canonical.com Thu May 2 22:08:42 2019 From: james.page at canonical.com (James Page) Date: Thu, 2 May 2019 16:08:42 -0600 Subject: [sig][upgrades] job done In-Reply-To: References: Message-ID: On Thu, May 2, 2019 at 3:57 PM James Page wrote: > Hi All > > After this mornings PTG session, I'm pleased to propose that we formally > end the Upgrades SIG. > > That’s a “pleased” because we feel that our job as a SIG is done! > > Upgrades in OpenStack are no longer a "special interest"; they are now an > integral part of the philosophy of projects within the OpenStack ecosystem > and although there are probably still some rough edges, we don’t think we > need a SIG to drive this area forward any longer. > Making this more formal as a proposal - https://review.opendev.org/656878 Cheers James -------------- next part -------------- An HTML attachment was scrubbed... URL: From pshchelokovskyy at mirantis.com Thu May 2 22:19:49 2019 From: pshchelokovskyy at mirantis.com (Pavlo Shchelokovskyy) Date: Thu, 2 May 2019 16:19:49 -0600 Subject: [keystone][heat] security_compliance options and auto-created users In-Reply-To: <7c744974-a22d-517b-765d-d5ea9912d953@redhat.com> References: <7c744974-a22d-517b-765d-d5ea9912d953@redhat.com> Message-ID: Hi all, to follow up on this, I created the following issues: Heat story https://storyboard.openstack.org/#!/story/2005210 , first patch is up https://review.opendev.org/#/c/656884/ Keystone bugs https://bugs.launchpad.net/keystone/+bug/1827431 https://bugs.launchpad.net/keystone/+bug/1827435 I'll work on patches to Keystone next, please review / comment on bugs/stories/patches :-) Cheers, On Wed, Apr 17, 2019 at 9:42 AM Zane Bitter wrote: > On 16/04/19 6:38 AM, Pavlo Shchelokovskyy wrote: > > Hi all, > > > > I am currently looking at options defined in [security_compliance] > > section of keystone.conf [0] and trying to understand how enabling those > > security features may affect other services. > > > > The first thing I see is that any project that auto-creates some > > temporary users may be affected. > > Of the top of my head I can recall only Heat and Tempest doing this. > > For Tempest situation is easier as a) tempest can use static credentials > > instead of dynamic ones so it is possible to craft appropriate users > > beforehand and b) those users are relatively short-lived (required for > > limited time). > > In case of Heat though those users are used for deferred auth (like in > > autoscaling) which for long lived stacks can happen at arbitrary time in > > future - which is a problem. > > > > Below is breakdown of options/features possible to set and what problems > > that may pose for Heat and ideas on how to work those around: > > > > - disable_user_account_days_inactive - this may pose a problem for > > deferred auth, and it seems is not possible to override it via user > > "options". IMO there's a need to add a new user option to Keystone to > > ignore this setting for given user, and then use it in Heat to create > > temporary users. > > +1 > > > - lockout failure options (lockout_failure_attempts, lockout_duration) - > > can be overridden by user option, but Heat has to set it first. Also the > > question remains how realistically such problem may arise for an > > auto-created internal user and whether Heat should set this option at all > > Sounds like a DoS waiting to happen if we don't override. > > > - password expiry options > > > (password_expires_days, unique_last_password_count, minimum_password_age) - > > poses a problem for deferred auth, but can be overridden by user option, > > so Heat should definitely set it IMO for users it creates > > +1 > > > - change_password_upon_first_use - poses problem for auto-generated > > users, can be overridden by a user option, but Heat must set it for its > > generated users > > +1 > > > - password strength enforcement > > (password_regex, password_regex_description) - now this is an > > interesting one. Currently Heat creates passwords for its temporary > > users with this function [1] falling back to empty password if a > > resource is not generating one for itself. Depending on regex_password > > setting in keystone, it may or may not be enough to pass the password > > strength check. > > This is technically true, although I implemented it so it should pass > all but the most brain-dead of policies. So I feel like doing nothing is > a valid option ;) > > > I've looked around and (as expected) generating a random string which > > satisfies a pre-defined arbitrary regex is quite a non-trivial task, > > couple of existing Python libraries that can do this note that they > > support only a limited subset of full regex spec. > > Yeah. If we're going to do it I think a more achievable way is by making > the current generator's rules (which essentially consist of a length > plus minimum counts of characters from particular classes) configurable > instead of hard-coded. I always assumed that we might eventually do > this, but didn't build it in at the start because the patch needed to be > backported. > > This is still pretty terrible because it's a configuration option the > operator has to set to match keystone's, and in a different format to > boot. Although, TBH a regex isn't a great choice for how to configure it > in keystone either - it's trivial if you want to force the user to > always use the password "password", but if you want to force the user to > e.g. have both uppercase and lowercase characters then you have to do > all kinds of weird lookahead assertions that require a PhD in Python's > specific flavour of regexps. > > As long as we don't try to do something like > https://review.openstack.org/436324 > > Note that Heat has it's own requirements too - one that I discovered is > that the passwords can't contain '$' because of reasons. > > > So it seems that a most simple solution would be to add yet another user > > option to Keystone to ignore password strength enforcement for this > > given user, and amend Heat to set this option as well for internal users > > it creates. > > That also works. > > > We in Heat may also think as to whether it would have any benefit to > > also set the 'lock_password' user option for the auto-created users > > which will prohibit such users to change their passwords via API > themselves. > > I can't think of any real benefit - or for that matter any real harm. > Presumably Heat itself would still be able to change the account's > password later, so it wouldn't stop us from implementing some sort of > rotation thing in the future. > > > I'd very like to hear opinion from Keystone community as most solutions > > I named are 'add new user option to Keystone' :-) > > > > [0] > > > https://opendev.org/openstack/keystone/src/branch/master/keystone/conf/security_compliance.py > > [1] > > > https://opendev.org/openstack/heat/src/branch/master/heat/common/password_gen.py#L112 > > > > Cheers, > > - Pavlo > > -- > > Dr. Pavlo Shchelokovskyy > > Principal Software Engineer > > Mirantis Inc > > www.mirantis.com > > > -- Dr. Pavlo Shchelokovskyy Principal Software Engineer Mirantis Inc www.mirantis.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Thu May 2 22:31:42 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Thu, 2 May 2019 16:31:42 -0600 Subject: [baremetal-sig] Planet for syndicating bare metal activity? Message-ID: Hi all - Good to see all the activity around bare metal this week. To keep information flowing, would it make sense to implement something like a planet feed for syndicating baremetal blog content from program members, linked to from the landing page https://www.openstack.org/bare-metal/ ? Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan.fainberg at gmail.com Thu May 2 22:30:58 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Thu, 2 May 2019 15:30:58 -0700 Subject: [keystone][heat] security_compliance options and auto-created users In-Reply-To: References: <7c744974-a22d-517b-765d-d5ea9912d953@redhat.com> Message-ID: There has been some work to allow for "defaults" for these overrides at, for example, the domain level (all users within a domain). Allowing such defaults based upon ownership would solve the concerns. On Thu, May 2, 2019 at 3:22 PM Pavlo Shchelokovskyy < pshchelokovskyy at mirantis.com> wrote: > Hi all, > > to follow up on this, I created the following issues: > Heat story https://storyboard.openstack.org/#!/story/2005210 , first > patch is up https://review.opendev.org/#/c/656884/ > Keystone bugs https://bugs.launchpad.net/keystone/+bug/1827431 > https://bugs.launchpad.net/keystone/+bug/1827435 > > I'll work on patches to Keystone next, please review / comment on > bugs/stories/patches :-) > > Cheers, > > On Wed, Apr 17, 2019 at 9:42 AM Zane Bitter wrote: > >> On 16/04/19 6:38 AM, Pavlo Shchelokovskyy wrote: >> > Hi all, >> > >> > I am currently looking at options defined in [security_compliance] >> > section of keystone.conf [0] and trying to understand how enabling >> those >> > security features may affect other services. >> > >> > The first thing I see is that any project that auto-creates some >> > temporary users may be affected. >> > Of the top of my head I can recall only Heat and Tempest doing this. >> > For Tempest situation is easier as a) tempest can use static >> credentials >> > instead of dynamic ones so it is possible to craft appropriate users >> > beforehand and b) those users are relatively short-lived (required for >> > limited time). >> > In case of Heat though those users are used for deferred auth (like in >> > autoscaling) which for long lived stacks can happen at arbitrary time >> in >> > future - which is a problem. >> > >> > Below is breakdown of options/features possible to set and what >> problems >> > that may pose for Heat and ideas on how to work those around: >> > >> > - disable_user_account_days_inactive - this may pose a problem for >> > deferred auth, and it seems is not possible to override it via user >> > "options". IMO there's a need to add a new user option to Keystone to >> > ignore this setting for given user, and then use it in Heat to create >> > temporary users. >> >> +1 >> >> > - lockout failure options (lockout_failure_attempts, lockout_duration) >> - >> > can be overridden by user option, but Heat has to set it first. Also >> the >> > question remains how realistically such problem may arise for an >> > auto-created internal user and whether Heat should set this option at >> all >> >> Sounds like a DoS waiting to happen if we don't override. >> >> > - password expiry options >> > >> (password_expires_days, unique_last_password_count, minimum_password_age) - >> > poses a problem for deferred auth, but can be overridden by user >> option, >> > so Heat should definitely set it IMO for users it creates >> >> +1 >> >> > - change_password_upon_first_use - poses problem for auto-generated >> > users, can be overridden by a user option, but Heat must set it for its >> > generated users >> >> +1 >> >> > - password strength enforcement >> > (password_regex, password_regex_description) - now this is an >> > interesting one. Currently Heat creates passwords for its temporary >> > users with this function [1] falling back to empty password if a >> > resource is not generating one for itself. Depending on regex_password >> > setting in keystone, it may or may not be enough to pass the password >> > strength check. >> >> This is technically true, although I implemented it so it should pass >> all but the most brain-dead of policies. So I feel like doing nothing is >> a valid option ;) >> >> > I've looked around and (as expected) generating a random string which >> > satisfies a pre-defined arbitrary regex is quite a non-trivial task, >> > couple of existing Python libraries that can do this note that they >> > support only a limited subset of full regex spec. >> >> Yeah. If we're going to do it I think a more achievable way is by making >> the current generator's rules (which essentially consist of a length >> plus minimum counts of characters from particular classes) configurable >> instead of hard-coded. I always assumed that we might eventually do >> this, but didn't build it in at the start because the patch needed to be >> backported. >> >> This is still pretty terrible because it's a configuration option the >> operator has to set to match keystone's, and in a different format to >> boot. Although, TBH a regex isn't a great choice for how to configure it >> in keystone either - it's trivial if you want to force the user to >> always use the password "password", but if you want to force the user to >> e.g. have both uppercase and lowercase characters then you have to do >> all kinds of weird lookahead assertions that require a PhD in Python's >> specific flavour of regexps. >> >> As long as we don't try to do something like >> https://review.openstack.org/436324 >> >> Note that Heat has it's own requirements too - one that I discovered is >> that the passwords can't contain '$' because of reasons. >> >> > So it seems that a most simple solution would be to add yet another >> user >> > option to Keystone to ignore password strength enforcement for this >> > given user, and amend Heat to set this option as well for internal >> users >> > it creates. >> >> That also works. >> >> > We in Heat may also think as to whether it would have any benefit to >> > also set the 'lock_password' user option for the auto-created users >> > which will prohibit such users to change their passwords via API >> themselves. >> >> I can't think of any real benefit - or for that matter any real harm. >> Presumably Heat itself would still be able to change the account's >> password later, so it wouldn't stop us from implementing some sort of >> rotation thing in the future. >> >> > I'd very like to hear opinion from Keystone community as most solutions >> > I named are 'add new user option to Keystone' :-) >> > >> > [0] >> > >> https://opendev.org/openstack/keystone/src/branch/master/keystone/conf/security_compliance.py >> > [1] >> > >> https://opendev.org/openstack/heat/src/branch/master/heat/common/password_gen.py#L112 >> > >> > Cheers, >> > - Pavlo >> > -- >> > Dr. Pavlo Shchelokovskyy >> > Principal Software Engineer >> > Mirantis Inc >> > www.mirantis.com >> >> >> > > -- > Dr. Pavlo Shchelokovskyy > Principal Software Engineer > Mirantis Inc > www.mirantis.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From MZavala at StateStreet.com Thu May 2 22:59:07 2019 From: MZavala at StateStreet.com (Zavala, Miguel) Date: Thu, 2 May 2019 22:59:07 +0000 Subject: [Desginate][Infoblox] Using infoblox as a backend to designate Message-ID: Hi all, Ive been trying to get Infoblox integrated with designate and I am running into some issues. Currently I can go to horizon, and create a zone there that then shows in infoblox, but when checking the logs I get :: Could not find 1556226600 for openstack.example. on enough nameservers.:: I saw the documentation listed here ,https://docs.openstack.org/designate/queens/admin/backends/infoblox.html, says that I have to set the designate mini-dns server as my external primary. Do I have to have a mini-dns running in order for designate to operate correctly? Im asking because designate has a database so it does not require synchronization like bind 9 does. I currently have a mini-dns setup on my controller node if I do need it. Thank you for reading! Regards, Miguel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Fri May 3 01:12:06 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Thu, 2 May 2019 21:12:06 -0400 Subject: [Zun] openstack appcontainer run error In-Reply-To: References: Message-ID: Hi Alejandro, The error message "http://10.8.9.54:2379 exceeded header timeout" indicates that Docker Daemon was not able to access the URL "http://10.8.9.54:2379", which is supposed to be the ETCD endpoint. If you run "curl http://10.8.9.54:2379" in compute host. Are you able to reach that endpoint? Best regards, Hongbin On Thu, May 2, 2019 at 3:58 PM Alejandro Ruiz Bermejo < arbermejo0417 at gmail.com> wrote: > I'm having troubles with the verify step of the Zun intallation at > Openstack Queens on Ubuntu 18.04 LTS. I previously Posted a trouble with it > and already fixed the error you guys pointed at. Now i still can't launch > the app container. It freeze at container_creating task, the shows an > error state > > root at controller /h/team# openstack appcontainer show > 4a657ac5-058c-43eb-8cbf-7239ad3c4d76 > > +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | Field | Value > > > | > > +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > | addresses | {} > > > | > | links | [{u'href': u' > http://controller:9517/v1/containers/4a657ac5-058c-43eb-8cbf-7239ad3c4d76', > u'rel': u'self'}, {u'href': u' > http://controller:9517/containers/4a657ac5-058c-43eb-8cbf-7239ad3c4d76', > u'rel': u'bookmark'}] | > | image | cirros > > > | > | labels | {} > > > | > | disk | 0 > > > | > | security_groups | [] > > > | > | image_pull_policy | None > > > | > | user_id | a16c6ef0319b4643a4ec8e56a1d025cb > > > | > | uuid | 4a657ac5-058c-43eb-8cbf-7239ad3c4d76 > > > | > | hostname | None > > > | > | environment | {} > > > | > | memory | None > > > | > | project_id | 59065d8f970b467aa94ef7b35f1edab5 > > > | > | status | Error > > > | > | workdir | None > > > | > | auto_remove | False > > > | > | status_detail | None > > > | > | host | None > > > | > | image_driver | docker > > > | > | task_state | None > > > | > | status_reason | *Docker internal error: 500 Server Error: Internal > Server Error ("failed to update store for object type > *libnetwork.endpointCnt: client: endpoint http://10.8.9.54:2379 > exceeded header timeout"). * | > | name | test1 > > > | > | restart_policy | {} > > > | > | ports | [] > > > | > | command | "ping" "8.8.8.8" > > > | > | runtime | None > > > | > | cpu | None > > > | > | interactive | False > > > | > > +-------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ > > > I tried to launch another container without defining a network and without > executing any command, and it also had the same error. > I can launch container from the computer node cli with docker commands, > the errors are when i try to launch them from the controller CLI. > I run a docker run hello-world at the compute node and everything went > fine > Wen u runned openstack appcontainer create hello-world i had exactly the > same error > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Fri May 3 03:31:55 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 21:31:55 -0600 Subject: [nova][ptg] Summary: CPU modeling in placement Message-ID: Spec: https://review.openstack.org/#/c/555081/ Summary: Rework the way logical processors are represented/requested in conf/placement/flavors. Stephen has at this point simplified the spec dramatically, reducing it to what may be the smallest possible cohesive unit of work. Even so, we all agreed that this is ugly and messy and will never be perfect. It was therefore agreed to... Action: approve the spec pretty much as is, start slinging code, make progress, refine as we go. efried . From openstack at fried.cc Fri May 3 03:47:58 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 21:47:58 -0600 Subject: [nova][ptg] Summary: Persistent Memory Message-ID: <374d31c8-39ad-5cba-3827-794dc8a45757@fried.cc> Specs: - Base: https://review.openstack.org/601596 - Libvirt: https://review.openstack.org/622893 Patches: https://review.openstack.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/virtual-persistent-memory Summary/agreements: - Support persistent memory in units of "namespaces" with custom resource class names. - Namespaces to be pre-carved out by admin/deployer (not by Nova). - Custom RC names mapped to byte sizes via "conf" [1] so virt driver can know how to map them back to the real resources. - "Ignore NUMA for now" (sean-k-mooney will have to tell you what that means exactly). - Spec needs to list support-or-not for all instance lifecycle operations. - Keep one spec for base enablement and one for libvirt, but make sure the right bits are in the right spec. efried [1] There has been a recurring theme of needing "some kind of config" - not necessarily nova.conf or any oslo.config - that can describe: - Resource provider name/uuid/parentage, be it an existing provider (the compute root RP in this case) or a new nested provider; - Inventory (e.g. pmem namespace resource in this case); - Physical resource(s) to which the inventory corresponds; - Traits, aggregates, other? As of this writing, no specifics have been decided, even to the point of positing that it could be the same file for some/all of the specs for which the issue arose. From openstack at fried.cc Fri May 3 03:59:15 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 21:59:15 -0600 Subject: [nova][ptg] Summary: Using Forbidden Aggregates Message-ID: <18a3542d-68d2-fd29-253f-880e54f12369@fried.cc> Spec: https://review.opendev.org/#/c/609960/ Summary: - TL;DR: Allows you to say "You can't land on a host that does X unless you specifically require X". Example: Keep my Windows-licensed hosts for Windows instances. - Exploit placement enablement for forbidden aggregates [1] in Nova - Set (nova) aggregate metadata with a syntax similar/identical to that of extra_specs for required traits (e.g. 'trait:CUSTOM_WINDOWS_ONLY': 'required') - During scheduling, nova will discover all aggregates with metadata of this form. For each: - Construct a list of the traits in the aggregate metadata - Subtract traits required by the server request's flavor+image. - If any traits from the aggregate remain, add this aggregate's UUID (which corresponds to a placement aggregate) to the list of "forbidden aggregates" for the GET /allocation_candidates request. Agreements: - The "discover all aggregates" bit has the potential to be slow, but is better than the alternative, which was having the admin supply the same information in a confusing conf syntax. And if performance becomes a problem, we can deal with it later; this does not paint us into a corner. - Spec has overall support, but a few open questions. Answer those, and we're good to approve and move forward. efried [1] https://docs.openstack.org/placement/latest/specs/train/approved/2005297-negative-aggregate-membership.html From openstack at fried.cc Fri May 3 04:03:55 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:03:55 -0600 Subject: [nova][ptg] Summary: Corner case issues with root volume detach/attach Message-ID: Etherpad: https://etherpad.openstack.org/p/detach-attach-root-volume-corner-cases Summary (copied verbatim from the bottom of the etherpad - ask mriedem if further explanation is needed): - Allow attaching a new root volume with a tag as described [in the etherpad] and/or a multiattach volume, don't restrict on whether or not the existing root volume had a tag or multiattach capability. - During unshelve, before scheduling, modify the RequestSpec (and don't persist it) if the BDMs have a tag or are multiattach (this is honestly an existing bug for unshelve). This is where the compute driver capability traits will be used for pre-filtering. (if unshelving fails with NoValidHost the instance remains in shelve_offloaded state [tested in devstack], so user can detach the volume and retry) - Refactor and re-use the image validation code from rebuild when a new root volume is attached. - Assert the new root volume is bootable. efried . From openstack at fried.cc Fri May 3 04:11:09 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:11:09 -0600 Subject: [nova][ptg] Summary: Extra specs validation Message-ID: <07673fec-c193-1031-b9f0-5d32c65cc124@fried.cc> Spec: https://review.openstack.org/#/c/638734/ Summary: Schema for syntactic validation of flavor extra specs, mainly to address otherwise-silently-ignored fat-fingering of keys and/or values. Agreements: - Do it in the flavor API when extra specs are set (as opposed to e.g. during server create) - One spec, but two stages: 1) For known keys, validate values; do this without a microversion. 2) Validate keys, which entails - Standard set of keys (by pattern) known to nova - Mechanism for admin to extend the set for snowflake extra specs specific to their deployment / OOT driver / etc. - "Validation" will at least comprise messaging/logging. - Optional "strict mode" making the operation fail is also a possibility. efried . From openstack at fried.cc Fri May 3 04:14:34 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:14:34 -0600 Subject: [nova][ptg] Summary: docs Message-ID: Summary: Nova docs could use some love. Agreement: Consider doc scrub as a mini-theme (cycle themes to be discussed Saturday) to encourage folks to dedicate some amount of time to reading & validating docs, and opening and/or fixing bugs for discovered issues. efried . From openstack at fried.cc Fri May 3 04:22:21 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:22:21 -0600 Subject: [nova][ptg] Summary: Next Steps for QoS Bandwidth Message-ID: <03bedee9-f1c2-dac1-af2e-83408cbe66d9@fried.cc> Blueprints/Specs: - Live migration support: https://review.opendev.org/#/c/652608 - Grab bag for other stuff: https://blueprints.launchpad.net/nova/+spec/enhance-support-for-ports-having-resource-request - Request group to resource provider mapping (needs to be moved to a placement spec): https://review.opendev.org/#/c/597601/ Agreements: - No microversion for adding migration operation support (or at least propose it that way in the spec and let discussion ensue there) - Can the live migration support depend on the existence of multiple portbinding or we have to support the old codepath as well when the port binding is created by the nova-compute on the destination host? => Yes, this extension cannot be turned off - Pull the trigger on rg-to-rp mappings in placement. This is also needed by other efforts (cyborg and VGPU at least). - Tag PFs in the PciDeviceSpec, and tag the corresponding RP indicating that it can do that. Require the trait - refuse to land on a host that can't do this, because the assignment will fail late. - Default group_policy=none and do the post-filtering on the nova side More discussion related to this topic may occur in the nova/neutron cross-project session, scheduled for Friday at 1400 in the Nova room: https://etherpad.openstack.org/p/ptg-train-xproj-nova-neutron efried . From jeremyfreudberg at gmail.com Fri May 3 04:27:13 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Fri, 3 May 2019 00:27:13 -0400 Subject: [ironic][neutron][ops] Ironic multi-tenant networking, VMs In-Reply-To: References: Message-ID: Thanks Julia; this is helpful. Thanks also for reading my mind a bit, as I am thinking of the VXLAN case... I can't help but notice that in the Ironic CI jobs, multi tenant networking being used seems to entail VLANs as the tenant network type (instead of VXLAN). Is it just coincidence / how the gate just is, or is it hinting something about how VXLAN and bare metal get along? On Wed, May 1, 2019 at 6:38 PM Julia Kreger wrote: > > Greetings Jeremy, > > Best Practice wise, I'm not directly aware of any. It is largely going > to depend upon your Neutron ML2 drivers and network fabric. > > In essence, you'll need an ML2 driver which supports the vnic type of > "baremetal", which is able to able to orchestrate the switch port port > binding configuration in your network fabric. If your using vlan > networks, in essence you'll end up with a neutron physical network > which is also a trunk port to the network fabric, and the ML2 driver > would then appropriately tag the port(s) for the baremetal node to the > networks required. In the CI gate, we do this in the "multitenant" > jobs where networking-generic-switch modifies the OVS port > configurations directly. > > If specifically vxlan is what your looking to use between VMs and > baremetal nodes, I'm unsure of how you would actually configure that, > but in essence the VXLANs would still need to be terminated on the > switch port via the ML2 driver. > > In term of Ironic's documentation, If you haven't already seen it, you > might want to check out ironic's multi-tenancy documentation[1]. > > -Julia > > [1]: https://docs.openstack.org/ironic/latest/admin/multitenancy.html > > On Wed, May 1, 2019 at 10:53 AM Jeremy Freudberg > wrote: > > > > Hi all, > > > > I'm wondering if anyone has any best practices for Ironic bare metal > > nodes and regular VMs living on the same network. I'm sure if involves > > Ironic's `neutron` multi-tenant network driver, but I'm a bit hazy on > > the rest of the details (still very much in the early stages of > > exploring Ironic). Surely it's possible, but I haven't seen mention of > > this anywhere (except the very old spec from 2015 about introducing > > ML2 support into Ironic) nor is there a gate job resembling this > > specific use. > > > > Ideas? > > > > Thanks, > > Jeremy > > From openstack at fried.cc Fri May 3 04:36:19 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:36:19 -0600 Subject: [nova][ptg] Summary: Resource Management Daemon Message-ID: Specs: - Base enablement: https://review.openstack.org/#/c/651130/ - Power management using CPU core P state control: https://review.openstack.org/#/c/651024/ - Last-level cache: https://review.openstack.org/#/c/651233/ Summary: - Represent new resources (e.g. last-level cache) which can be used for scheduling. - Resource Management Daemon (RMD) manages the (potentially dynamic) assignment of these resources to VMs. Direction: - There shall be no direct communication between nova-compute (including virt driver) and RMD. - Admin/orchestration to supply "conf" [1] describing the resources. - Nova processes this conf while updating provider trees to make the resources appear appropriately in placement. - Flavors can be designed to request the resources so they are considered and allocated during scheduling. - RMD must do its thing "out of band", e.g. triggered by listening for events (recommended: libvirt events, which are local to the host, rather than nova events) and requesting/introspecting information from flavor/image/placement. - Things not related to resource (like p-state control) can use traits to ensure scheduling to capable hosts. (Also potential to use forbidden aggregates [2] to isolate those hosts to only p-state-needing VMs.) - Delivery mechanism for RMD 'policy' artifacts via an extra spec with an opaque string which may represent e.g. a glance UUID, swift object, etc. efried [1] There has been a recurring theme of needing "some kind of config" - not necessarily nova.conf or any oslo.config - that can describe: - Resource provider name/uuid/parentage, be it an existing provider or a new nested provider; - Inventory (e.g. last-level cache in this case); - Physical resource(s) to which the inventory corresponds (e.g. "cache ways" in this case); - Traits, aggregates, other? As of this writing, no specifics have been decided, even to the point of positing that it could be the same file for some/all of the specs for which the issue arose. [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005803.html From openstack at fried.cc Fri May 3 04:41:33 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 2 May 2019 22:41:33 -0600 Subject: [nova][ptg] Summary: Replace python-*client with OpenStack SDK Message-ID: <9b7f4f6a-0355-ad21-d6c7-91f8415d9be7@fried.cc> Blueprint: https://blueprints.launchpad.net/nova/+spec/openstacksdk-in-nova Summary: - Enable use of OpenStack SDK from nova. - Phase out use of python-*client (for * in ironic, glance, cinder, neutron...) eventually removing those deps completely from nova. - SDK capable of using ksa oslo.config options, so no changes necessary in deployments; but deployments can start using clouds.yaml as they choose. Agreement: Do it. Action: Reviewers to look at the blueprint and decide whether a spec is needed. efried . From florian.engelmann at everyware.ch Fri May 3 06:59:47 2019 From: florian.engelmann at everyware.ch (Florian Engelmann) Date: Fri, 3 May 2019 08:59:47 +0200 Subject: [all projects] events aka notifications Message-ID: Hi, most or all openstack services do send notifications/events to the message bus. How to know which notifications are sent? Is there some list of the event names? All the best, Flo -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5230 bytes Desc: not available URL: From doka.ua at gmx.com Fri May 3 07:07:26 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Fri, 3 May 2019 10:07:26 +0300 Subject: [octavia] Error while creating amphora In-Reply-To: References: Message-ID: <867dde2f-83ca-63ce-5ee7-bfa962ff46aa@gmx.com> Hi Michael, the reason is my personal perception that file injection is quite legacy way and I even didn't know whether it enabed or no in my installation :-) When configdrive is available, I'd prefer to use it in every case. I set "user_data_config_drive" to False and passed this step. Thanks for pointing on this. Now working with next issues launching amphorae, will back soon :-) Thank you. On 5/2/19 5:58 PM, Michael Johnson wrote: > Volodymyr, > > It looks like you have enabled "user_data_config_drive" in the > octavia.conf file. Is there a reason you need this? If not, please > set it to False and it will resolve your issue. > > It appears we have a python3 bug in the "user_data_config_drive" > capability. It is not generally used and appears to be missing test > coverage. > > I have opened a story (bug) on your behalf here: > https://storyboard.openstack.org/#!/story/2005553 > > Michael > > On Thu, May 2, 2019 at 4:29 AM Volodymyr Litovka wrote: >> Dear colleagues, >> >> I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: >> >> DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 >> ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes >> ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute >> ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config >> ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render >> ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render >> ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception >> ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise >> ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code >> ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent >> ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method >> ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes >> ERROR octavia.controller.worker.tasks.compute_tasks >> WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' >> >> Any advises where is the problem? >> >> My environment: >> - Openstack Rocky >> - Ubuntu 18.04 >> - Octavia installed in virtualenv using pip install: >> # pip list |grep octavia >> octavia 4.0.0 >> octavia-lib 1.1.1 >> python-octaviaclient 1.8.0 >> >> Thank you. >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From florian.engelmann at everyware.ch Fri May 3 09:07:52 2019 From: florian.engelmann at everyware.ch (Florian Engelmann) Date: Fri, 3 May 2019 11:07:52 +0200 Subject: [ceilometer] events are deprecated - true? Message-ID: <3daf12c2-a82a-16b9-515a-206628bc1cff@everyware.ch> Hi, I was wondering if events are still deprecated? https://github.com/openstack/ceilometer/blob/master/doc/source/admin/telemetry-events.rst "Warning Events support is deprecated." But how to handle all those service events if ceilometer will drop the support to validate and store those messages in gnocchi? Is there any longterm plan how to handle billing then? Why should this feature be deprecated? All the best, Flo -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5230 bytes Desc: not available URL: From balazs.gibizer at ericsson.com Fri May 3 13:40:28 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 3 May 2019 13:40:28 +0000 Subject: [all projects] events aka notifications In-Reply-To: References: Message-ID: <1556890817.16566.0@smtp.office365.com> On Fri, May 3, 2019 at 12:59 AM, Florian Engelmann wrote: > Hi, > > most or all openstack services do send notifications/events to the > message bus. How to know which notifications are sent? Is there some > list of the event names? Nova versioned notifications are documented in [1]. Cheers, gibi [1] https://docs.openstack.org/nova/latest/reference/notifications.html#existing-versioned-notifications > > All the best, > Flo From tetsuro.nakamura.bc at hco.ntt.co.jp Fri May 3 14:22:13 2019 From: tetsuro.nakamura.bc at hco.ntt.co.jp (Tetsuro Nakamura) Date: Fri, 03 May 2019 23:22:13 +0900 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <1556631941.24201.1@smtp.office365.com> References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> Message-ID: <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> Sorry for the late response, Here is my thoughts on "resource provider affinity". “The rps are in a same subtree” is equivalent to “there exits an rp which is an ancestor of all the other rps” Therefore, * group_resources=1:2 means “rp2 is a descendent of rp1 (or rp1 is a descendent of rp2.)” We can extend it to cases we have more than two groups: * group_resources=1:2:3 means "both rp2 and rp3 are descendents of rp1 (or both rp1 and rp3 are of rp2 or both rp1 and rp2 are of rp3) Eric's question from PTG yesterday was whether to keep the symmetry between rps, that is, whether to take the conditions enclosed in the parentheses above. I would say yes keep the symmetry because 1. the expression 1:2:3 is more of symmetry. If we want to make it asymmetric, it should express the subtree root more explicitly like 1-2:3 or 1-2:3:4. 2. callers may not be aware of which resource (VCPU or VF) is provided by the upper/lower rp.     IOW, the caller - resource retriever (scheduler) -  doesn't want to know how the reporter - virt driver - has reported the resouces. Note that even in the symmetric world the negative expression jay suggested looks good to me. It enables something like: * group_resources=1:2:!3:!4 which means 1 and 2 should be in the same group but 3 shoudn't be the descendents of 1 or 2, so as 4. However, speaking in the design level, the adjacency list model (so called naive tree model), which we currently use for nested rps, is not good at retrieving subtrees (compared to e.g. nested set model[1]). [1] https://en.wikipedia.org/wiki/Nested_set_model I have looked into recursive SQL CTE (common table expression) feature which help us treat subtree easily in adjacency list model in a experimental patch [2], but unfortunately it looks like the feature is still experimental in MySQL, and we don't want to query like this per every candidates, do we? :( [2] https://review.opendev.org/#/c/636092/ Therefore, for this specific use case of NUMA affinity I'd like alternatively propose bringing a concept of resource group distance in the rp graph. * numa affinity case   - group_distance(1:2)=1 * anti numa affinity   - group_distance(1:2)>1 which can be realized by looking into the cached adjacency rp (i.e. parent id) (supporting group_distance=N (N>1) would be a future research or implement anyway overlooking the performance) One drawback of this is that we can't use this if you create multiple nested layers with more than 1 depth under NUMA rps, but is that the case for OvS bandwidth? Another alternative is having a "closure table" from where we can retrieve all the descendent rp ids of an rp without joining tables. but... online migration cost? - tetsuro -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Fri May 3 15:03:38 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 3 May 2019 09:03:38 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> Message-ID: <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> > “The rps are in a same subtree” is equivalent to “there exits an rp > which is an ancestor of all the other rps” ++ > I would say yes keep the symmetry because > > 1. the expression 1:2:3 is more of symmetry. If we want to make it > asymmetric, it should express the subtree root more explicitly like > 1-2:3 or 1-2:3:4. > 2. callers may not be aware of which resource (VCPU or VF) is provided > by the upper/lower rp. >     IOW, the caller - resource retriever (scheduler) -  doesn't want to > know how the reporter - virt driver - has reported the resouces. This. (If we were going to do asymmetric, I agree we would need a clearer syntax. Another option I thought of was same_subtree1=2,3,!4. But still prefer symmetric.) > It enables something like: > * group_resources=1:2:!3:!4 > which means 1 and 2 should be in the same group but 3 shoudn't be the > descendents of 1 or 2, so as 4. In a symmetric world, this one is a little ambiguous to me. Does it mean 4 shouldn't be in the same subtree as 3 as well? > However, speaking in the design level, the adjacency list model (so > called naive tree model), which we currently use for nested rps, > is not good at retrieving subtrees Based on my limited understanding, we may want to consider at least initially *not* trying to do this in sql. We can gather the candidates as we currently do and then filter them afterward in python (somewhere in the _merge_candidates flow). > One drawback of this is that we can't use this if you create multiple > nested layers with more than 1 depth under NUMA rps, > but is that the case for OvS bandwidth? If the restriction is because "the SQL is difficult", I would prefer not to introduce a "distance" concept. We've come up with use cases where the nesting isn't simple. > Another alternative is having a "closure table" from where we can > retrieve all the descendent rp ids of an rp without joining tables. > but... online migration cost? Can we consider these optimizations later, if the python-side solution proves non-performant? efried . From sbauza at redhat.com Fri May 3 15:57:38 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 3 May 2019 09:57:38 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> Message-ID: On Fri, May 3, 2019 at 9:24 AM Eric Fried wrote: > > “The rps are in a same subtree” is equivalent to “there exits an rp > > which is an ancestor of all the other rps” > > ++ > > > I would say yes keep the symmetry because > > > > 1. the expression 1:2:3 is more of symmetry. If we want to make it > > asymmetric, it should express the subtree root more explicitly like > > 1-2:3 or 1-2:3:4. > > 2. callers may not be aware of which resource (VCPU or VF) is provided > > by the upper/lower rp. > > IOW, the caller - resource retriever (scheduler) - doesn't want to > > know how the reporter - virt driver - has reported the resouces. > > This. > > (If we were going to do asymmetric, I agree we would need a clearer > syntax. Another option I thought of was same_subtree1=2,3,!4. But still > prefer symmetric.) > > > It enables something like: > > * group_resources=1:2:!3:!4 > > which means 1 and 2 should be in the same group but 3 shoudn't be the > > descendents of 1 or 2, so as 4. > > In a symmetric world, this one is a little ambiguous to me. Does it mean > 4 shouldn't be in the same subtree as 3 as well? > > First, thanks Tetsuro for investigating ways to support such queries. Very much appreciated. I hope I can dedicate a few time this cycle to see whether I could help with implementing NUMA affinity as I see myself as the first consumer of such thing :-) > > However, speaking in the design level, the adjacency list model (so > > called naive tree model), which we currently use for nested rps, > > is not good at retrieving subtrees > > > Based on my limited understanding, we may want to consider at least > initially *not* trying to do this in sql. We can gather the candidates > as we currently do and then filter them afterward in python (somewhere > in the _merge_candidates flow). > > > One drawback of this is that we can't use this if you create multiple > > nested layers with more than 1 depth under NUMA rps, > > but is that the case for OvS bandwidth? > > If the restriction is because "the SQL is difficult", I would prefer not > to introduce a "distance" concept. We've come up with use cases where > the nesting isn't simple. > > > Another alternative is having a "closure table" from where we can > > retrieve all the descendent rp ids of an rp without joining tables. > > but... online migration cost? > > Can we consider these optimizations later, if the python-side solution > proves non-performant? > > Huh, IMHO the whole benefits of having SQL with Placement was that we were getting a fast distributed lock proven safe. Here, this is a read so I don't really bother on any potential contention, but I just wanted to say that if we go this way, we absolutely need to make enough safeguards so that we don't loose the key interest of Placement. This is not trivial either way then. -Sylvain efried > . > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gr at ham.ie Fri May 3 16:51:09 2019 From: gr at ham.ie (Graham Hayes) Date: Fri, 3 May 2019 10:51:09 -0600 Subject: [Desginate][Infoblox] Using infoblox as a backend to designate In-Reply-To: References: Message-ID: <50ed14f2-fe5d-2af2-f165-1360b9832681@ham.ie> Hi, Yes - Designate needs miniDNS to be running for this to work. What we do is create a secondary zone on the InfoBlox server, and it will do a zone transfer from Designate when you update the zone. Thanks, Graham On 02/05/2019 16:59, Zavala, Miguel wrote: > Hi all, > > Ive been trying to get Infoblox integrated with designate and I am > running into some issues. Currently I can go to horizon, and create a > zone there that then shows in infoblox, but when checking the logs I get > :: Could not find 1556226600 for openstack.example. on enough > nameservers.:: I saw the documentation listed here > ,https://docs.openstack.org/designate/queens/admin/backends/infoblox.html, > says that I have to set the designate mini-dns server as my external > primary. Do I have to have a mini-dns running in order for > designate to operate correctly? Im asking because designate has a > database so it does not require synchronization like bind 9 does. I > currently have a mini-dns setup on my controller node if I do need it. > Thank you for reading! > > Regards, > > Miguel > From ed at leafe.com Fri May 3 17:02:00 2019 From: ed at leafe.com (Ed Leafe) Date: Fri, 3 May 2019 11:02:00 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> Message-ID: On May 3, 2019, at 8:22 AM, Tetsuro Nakamura wrote: > > I have looked into recursive SQL CTE (common table expression) feature which help us treat subtree easily in adjacency list model in a experimental patch [2], > but unfortunately it looks like the feature is still experimental in MySQL, and we don't want to query like this per every candidates, do we? :( At the risk of repeating myself, SQL doesn’t model the relationships among entities involved with either nested providers or shared providers. These relationships are modeled simply in a graph database, avoiding the gymnastics needed to fit them into a relational DB. I have a working model of Placement that has already solved nested providers (any depth), shared providers, project usages, and more. If you have time while at PTG, grab me and I’d be happy to demonstrate. -- Ed Leafe From cdent+os at anticdent.org Fri May 3 17:36:39 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 3 May 2019 11:36:39 -0600 (MDT) Subject: [placement][ptg] Updated Agenda Message-ID: See also the agenda in: https://etherpad.openstack.org/p/placement-ptg-train Yesterday's cross project session with nova [1] was efficient enough that the expected overflow into today has not been necessary. It also filled up Placement's work queue enough that we don't really need to choose more work, just refine the plans. To that end the agenda for today is very open: Friday: 2:30-2:40: Team Photo Rest of the time: Either working with other projects in their rooms (as required), or working on refining plans, writing specs, related in the placement room. With Saturday more concrete when people may have more free time. Saturday: 09:00-??:??: Discuss possibilities with Ironic and Blazar 10:00-??:??: Cinder joins those discussions 13:30-14:30: Implementing nested magic. See http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005815.html 14:30-15:00: Consumer types: https://review.opendev.org/#/c/654799/ 15:00-15:30: Catchup / Documenting Future Actions 15:30-Beer: Retrospective Refactoring and Cleanliness Goals Hacking [1] I'm currently producing some messages summarizing that, but wanted to get these agenda adjustments out first. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From Tushar.Patil at nttdata.com Fri May 3 17:58:44 2019 From: Tushar.Patil at nttdata.com (Patil, Tushar) Date: Fri, 3 May 2019 17:58:44 +0000 Subject: [nova][ptg] Summary: Using Forbidden Aggregates In-Reply-To: <18a3542d-68d2-fd29-253f-880e54f12369@fried.cc> References: <18a3542d-68d2-fd29-253f-880e54f12369@fried.cc> Message-ID: >> - Spec has overall support, but a few open questions. Answer those, and >> we're good to approve and move forward. I have replied to all open questions and fix the nits. I have accepted Tetsuro suggestion to add traits to the compute node resource provider in the nova placement sync_aggregates command if aggregates are configured with metadata with kye/value pair "trait:traits_name=required". Request everyone to kindly review the updated specs. https://review.opendev.org/#/c/609960/ Regards, Tushar Patil ________________________________________ From: Eric Fried Sent: Friday, May 3, 2019 12:59:15 PM To: OpenStack Discuss Subject: [nova][ptg] Summary: Using Forbidden Aggregates Spec: https://review.opendev.org/#/c/609960/ Summary: - TL;DR: Allows you to say "You can't land on a host that does X unless you specifically require X". Example: Keep my Windows-licensed hosts for Windows instances. - Exploit placement enablement for forbidden aggregates [1] in Nova - Set (nova) aggregate metadata with a syntax similar/identical to that of extra_specs for required traits (e.g. 'trait:CUSTOM_WINDOWS_ONLY': 'required') - During scheduling, nova will discover all aggregates with metadata of this form. For each: - Construct a list of the traits in the aggregate metadata - Subtract traits required by the server request's flavor+image. - If any traits from the aggregate remain, add this aggregate's UUID (which corresponds to a placement aggregate) to the list of "forbidden aggregates" for the GET /allocation_candidates request. Agreements: - The "discover all aggregates" bit has the potential to be slow, but is better than the alternative, which was having the admin supply the same information in a confusing conf syntax. And if performance becomes a problem, we can deal with it later; this does not paint us into a corner. - Spec has overall support, but a few open questions. Answer those, and we're good to approve and move forward. efried [1] https://docs.openstack.org/placement/latest/specs/train/approved/2005297-negative-aggregate-membership.html Disclaimer: This email and any attachments are sent in strictest confidence for the sole use of the addressee and may contain legally privileged, confidential, and proprietary data. If you are not the intended recipient, please advise the sender by replying promptly to this email and then delete and destroy this email and any attachments without any further use, copying or forwarding. From michele at acksyn.org Fri May 3 17:59:05 2019 From: michele at acksyn.org (Michele Baldessari) Date: Fri, 3 May 2019 19:59:05 +0200 Subject: [oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI In-Reply-To: References: <229a2a53-870f-44c3-5e0c-6cfa9d45d0c5@oracle.com> <3275304e-d717-8b89-557e-b650fc4f661a@oracle.com> <20190420063850.GA18527@holtby.speedport.ip> <8b9cb0e4-b3a4-986a-be59-5bba6ae00f4e@nemebean.com> Message-ID: <20190503175904.GA26117@holtby> On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote: > > > On 4/22/19 12:53 PM, Alex Schultz wrote: > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec wrote: > > > > > > > > > > > > On 4/20/19 1:38 AM, Michele Baldessari wrote: > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, iain.macdonnell at oracle.com wrote: > > > > > > > > > > Today I discovered that this problem appears to be caused by eventlet > > > > > monkey-patching. I've created a bug for it: > > > > > > > > > > https://bugs.launchpad.net/nova/+bug/1825584 > > > > > > > > Hi, > > > > > > > > just for completeness we see this very same issue also with > > > > mistral (actually it was the first service where we noticed the missed > > > > heartbeats). iirc Alex Schultz mentioned seeing it in ironic as well, > > > > although I have not personally observed it there yet. > > > > > > Is Mistral also mixing eventlet monkeypatching and WSGI? > > > > > > > Looks like there is monkey patching, however we noticed it with the > > engine/executor. So it's likely not just wsgi. I think I also saw it > > in the ironic-conductor, though I'd have to try it out again. I'll > > spin up an undercloud today and see if I can get a more complete list > > of affected services. It was pretty easy to reproduce. > > Okay, I asked because if there's no WSGI/Eventlet combination then this may > be different from the Nova issue that prompted this thread. It sounds like > that was being caused by a bad interaction between WSGI and some Eventlet > timers. If there's no WSGI involved then I wouldn't expect that to happen. > > I guess we'll see what further investigation turns up, but based on the > preliminary information there may be two bugs here. So just to get some closure on this error that we have seen around mistral executor and tripleo with python3: this was due to the ansible action that called subprocess which has a different implementation in python3 and so the monkeypatching needs to be adapted. Review which fixes it for us is here: https://review.opendev.org/#/c/656901/ Damien and I think the nova_api/eventlet/mod_wsgi has a separate root-cause (although we have not spent all too much time on that one yet) cheers. Michele -- Michele Baldessari C2A5 9DA3 9961 4FFB E01B D0BC DDD4 DCCB 7515 5C6D From cdent+os at anticdent.org Fri May 3 18:22:45 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 3 May 2019 12:22:45 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Nested Magic With Placement Message-ID: This message is an attempt to summarize some of the discussions held yesterday during the nova-placement cross project session [1] that were in some way related to the handling of nested providers. There were several individual topics: * NUMA Topology with placement * Subtree affinity with placement * Request group mapping * Resourceless trait filters But they are all closely inter-related, so summarizing the discussion here as a lump. There was some discussion raised about whether representing NUMA topology in placement was worth pursuing as it is not strictly necessary and dang it is sure taking us a long time to get there and will replace an existing set of worms with a new set of worms. The conversation to resolved to: It's worth trying. To make it work there are some adjustments required to how placement operates: * We need to implement a form of the can_split feature (as previously described in [2]) to allow some classes of resource to be satisfied by multiple providers. * The `group_policy=same_tree[...]` concept is needed (as initially proposed in [3]) for affinity (and anti). Some discussion on implementation has started at [4] and there will be in-person discussion in the placement PTG room tomorrow (Saturday) afternoon. * trait and aggregate membership should "flow down" when making any kind of request (numbered or unnumbered). This is closely tied to the implementation of the previous point. * Being able to express 'required' without a 'resources' is required when making an allocation candidates query. * There are several in-flight nova patches where some hacks to flavors are being performed to work around the current lack of this feature. These are okay and safe to carry on with because they are ephemeral. * The number required and resources query parameters need to accept arbitrary strings so it is possible to say things like 'resources_compute' and 'resources_network' to allow conventions to emerge when multiple parties may be involved in manipulating a RequestGroup. * A 'mappings' key will be added to the 'allocations' object in the allocation_candidates response that will support request group mapping. * There will be further discussion on these features Saturday at the PTG starting at 13:30. Actions: * This (Friday) afternoon at the PTG I'll be creating rfe stories associated with these changes. If you'd like to help with that, find me in the placement room (109). We'll work out whether those stories needs specs in the normally processing of the stories. We'll also need to find owners for many of them. * Gibi will be updating the request group mapping spec. [1] https://etherpad.openstack.org/p/ptg-train-xproj-nova-placement [2] https://review.opendev.org/#/c/560974/ [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005673.html [4] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005815.html [5] https://review.opendev.org/#/c/597601/ -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From openstack at fried.cc Fri May 3 18:32:24 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 3 May 2019 12:32:24 -0600 Subject: [nova][ptg] Summary: API inconsistency cleanups Message-ID: Spec: https://review.openstack.org/#/c/603969/ Summary: The spec describes a bunch of mostly-unrelated-to-each-other API cleanups, listed below. The spec proposes to do them all in a single microversion. Consensus: - 400 for unknown/invalid params in querystring / request body => Do it. - Remove OS-* prefix from request and response field. - Proposed alternative: accept either, return both => Don't do it (either way) - Making server representation always consistent among all APIs returning the complete server representation. => Do it (in the same microversion) - Return ``servers`` field as an empty list in response of GET /os-hypervisors when there are no servers (currently it is omitted) => Do it (in the same microversion) - Consistent error codes on quota exceeded => Don't do it - Lump https://review.opendev.org/#/c/648919/ (change flavors.swap default from '' [string] to 0 [int] in the response) into the same effort? => Do it (in the same microversion) efried . From paye600 at gmail.com Fri May 3 18:48:10 2019 From: paye600 at gmail.com (Roman Gorshunov) Date: Fri, 3 May 2019 20:48:10 +0200 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects Message-ID: Hello Jim, team, I'm from Airship project. I agree with archival of Github mirrors of repositories. One small suggestion: could we have project descriptions adjusted to point to the new location of the source code repository, please? E.g. "The repo now lives at opendev.org/x/y". Thanks to AJaeger & clarkb. Thank you. Best regards, -- Roman Gorshunov From Tim.Bell at cern.ch Fri May 3 18:58:41 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Fri, 3 May 2019 18:58:41 +0000 Subject: [cinder][ops] Nested Quota Driver Use? In-Reply-To: <20190502003249.GA1432@sm-workstation> References: <20190502003249.GA1432@sm-workstation> Message-ID: We're interested in the overall functionality but I think unified limits is the place to invest and thus would not have any problem deprecating this driver. We'd really welcome this being implemented across all the projects in a consistent way. The sort of functionality proposed in https://techblog.web.cern.ch/techblog/post/nested-quota-models/ would need Nova/Cinder/Manila at miniumum for CERN to switch. So, no objections to deprecation but strong support to converge on unified limits. Tim -----Original Message----- From: Sean McGinnis Date: Thursday, 2 May 2019 at 02:39 To: "openstack-discuss at lists.openstack.org" Subject: [cinder][ops] Nested Quota Driver Use? Hey everyone, I'm hoping to get some feedback from folks, especially operators. In the Liberty release, Cinder introduced the ability to use a Nest Quota Driver to handle cases of heirarchical projects and quota enforcement [0]. I have not heard of anyone actually using this. I also haven't seen any bugs filed, which makes me a little suspicious given how complicated it can be. I would like to know if any operators are using this for nested quotas. There is an effort underway for a new mechanism called "unified limits" that will require a lot of modifications to the Cinder code. If this quota driver is not needed, I would like to deprecated it in Train so it can be removed in the U release and hopefully prevent some unnecessary work being done. Any feedback on this would be appreciated. Thanks! Sean [0] https://specs.openstack.org/openstack/cinder-specs/specs/liberty/cinder-nested-quota-driver.html From arbermejo0417 at gmail.com Fri May 3 19:05:42 2019 From: arbermejo0417 at gmail.com (Alejandro Ruiz Bermejo) Date: Fri, 3 May 2019 15:05:42 -0400 Subject: [ZUN] Proxy on Docker + Zun Message-ID: I'm still working on my previous error of the openstack appcontainer run error state: I have Docker working behind a Proxy. As you can see in the Docker info i attach to this mail. I tried to do the curl http://10.8.9.54:2379/health with the proxy environment variable and i got timeout error (without it the curl return the normal healthy state for the etcd cluster). So my question is if i'm having a problem with the proxy configuration and docker commands when i'm executing the openstack appcontainer run. And if you know any use case of someone working with Docker behind a proxy and Zun in the Openstack environment. This is the outputh of # systemctl show --property Environment docker Environment=HTTP_PROXY=http://10.8.7.60:3128/ NO_PROXY=localhost, 127.0.0.0/8,10.8.0.0/16 HTTPS_PROXY=http://10.8.7.60:3128/ And this is the one of root at compute /h/team# docker info Containers: 9 Running: 0 Paused: 0 Stopped: 9 Images: 7 Server Version: 18.09.5 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84 runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30 init version: fec3683 Security Options: apparmor seccomp Profile: default Kernel Version: 4.15.0-48-generic Operating System: Ubuntu 18.04.2 LTS OSType: linux Architecture: x86_64 CPUs: 8 Total Memory: 15.66GiB Name: compute ID: W35H:WCPP:AM3K:NENH:FEOR:S23C:N3FZ:QELB:LLUR:USMJ:IM7W:YMFX Docker Root Dir: /var/lib/docker Debug Mode (client): false Debug Mode (server): false HTTP Proxy: http://10.8.7.60:3128/ HTTPS Proxy: http://10.8.7.60:3128/ No Proxy: localhost,127.0.0.0/8,10.8.0.0/16 Registry: https://index.docker.io/v1/ Labels: Experimental: false Cluster Store: etcd://10.8.9.54:2379 Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine WARNING: API is accessible on http://compute:2375 without encryption. Access to the remote API is equivalent to root access on the host. Refer to the 'Docker daemon attack surface' section in the documentation for more information: https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface WARNING: No swap limit support -------------- next part -------------- An HTML attachment was scrubbed... URL: From pabelanger at redhat.com Fri May 3 19:05:38 2019 From: pabelanger at redhat.com (Paul Belanger) Date: Fri, 3 May 2019 15:05:38 -0400 Subject: [tc][all] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: Message-ID: <20190503190538.GB3377@localhost.localdomain> On Fri, May 03, 2019 at 08:48:10PM +0200, Roman Gorshunov wrote: > Hello Jim, team, > > I'm from Airship project. I agree with archival of Github mirrors of > repositories. One small suggestion: could we have project descriptions > adjusted to point to the new location of the source code repository, > please? E.g. "The repo now lives at opendev.org/x/y". > This is something important to keep in mind from infra side, once the repo is read-only, we lose the ability to use the API to change it. >From manage-projects.py POV, we can update the description before flipping the archive bit without issues, just need to make sure we have the ordering correct. Also, there is no API to unarchive a repo from github sadly, for that a human needs to log into github UI and click the button. I have no idea why. - Paul > Thanks to AJaeger & clarkb. > > Thank you. > > Best regards, > -- Roman Gorshunov > From cdent+os at anticdent.org Fri May 3 20:08:00 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 3 May 2019 14:08:00 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Shared resource providers for shared disk on compute hosts Message-ID: See also: https://etherpad.openstack.org/p/ptg-train-xproj-nova-placement There's a spec in progress about turning on support for shared disk providers [1]. We discussed some of the details that need to be resolved and actions that need to be taken. The next action is for Tushar to update the spec to reflect the decisions and alternatives: * For some virt drivers, we need example one or two tools for: * creating a shared disk provider, setting inventory, creating aggregate, adding compute nodes to the aggregate * updating inventory when the (absolute) size of the storage changes These were initially discussed as example tools that live in the placement repo but it might actually be better in nova. There's an abandoned example [2] from long ago. * Other virt drivers (and potentially Ceph w/libvirt if a reliable source of identifier is available) will be able to manage this sort of thing themselves in update_provider_tree. * Other options (for managing the initial management of the shared disk provider) include: * write the provider info into a well-known file on the shared disk * variations on the inventory.yaml file * We would like to have shared disk testing in the gate. Matt has started https://review.opendev.org/#/c/586363/ but it does not test multinode, yet. Note that apart from the sample tools described above, which might be in the placement repo, the required actions here are on the nova side. At least until we find bugs on the placement side resulting from this work. [1] https://review.opendev.org/#/c/650188/ [2] https://review.opendev.org/382613 -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From cdent+os at anticdent.org Fri May 3 20:16:57 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 3 May 2019 14:16:57 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Testing PlacementFixture effectively Message-ID: See also: https://etherpad.openstack.org/p/ptg-train-xproj-nova-placement Nova uses the PlacementFixture (provided by placement) to be able to do functional tests with a real placement API and database. This works pretty well but we discovered during the run to the end of Stein that seemingly unrelated changes in placement could break the fixture and bring nova's gate to a halt. Bad. Decision: Placement will run nova's functional tests in its own gate on each change. If it proves to save some time the api_sample tests will be blacklisted. We do not want to whitelist as that will lead to trouble in the future. There was discussion of doing this for osc-placement as well, but since we just saved a bunch of elapsed time with functional tests in osc-placement with https://review.opendev.org/#/c/651939/ and there's no integrated gate criticality with osc-placement, we decided not to. Action: cdent will make a story and do this -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From persia at shipstone.jp Fri May 3 20:49:42 2019 From: persia at shipstone.jp (Emmet Hikory) Date: Sat, 4 May 2019 05:49:42 +0900 Subject: [tc] Proposal: restrict TC activities Message-ID: <20190503204942.GB28010@shipstone.jp> All, I’ve spent the last few years watching the activities of the technical committee , and in recent cycles, I’m seeing a significant increase in both members of our community asking the TC to take action on things, and the TC volunteering to take action on things in the course of internal discussions (meetings, #openstack-tc, etc.). In combination, these trends appear to have significantly increased the amount of time that members of the technical committee spend on “TC work”, and decreased the time that they spend on other activities in OpenStack. As such, I suggest that the Technical Committee be restricted from actually doing anything beyond approval of merges to the governance repository. Firstly, we select members of the technical committee from amongst those of us who have some of the deepest understanding of the entire project and frequently those actively involved in multiple projects and engaged in cross-project coordination on a regular basis. Anything less than this fails to produce enough name recognition for election. As such, when asking the TC to be responsible for activities, we should equally ask whether we wish the very people responsible for the efficiency of our collaboration to cease doing so in favor of whatever we may have assigned to the TC. Secondly, in order to ensure continuity, we need to provide a means for rotation of the TC: this is both to allow folk on the TC to pursue other activities, and to allow folk not on the TC to join the TC and help with governance and coordination. If we wish to increase the number of folk who might be eligible for the TC, we do this best by encouraging them to take on activities that involve many projects or affect activities over all of OpenStack. These activities must necessarily be taken by those not current TC members in order to provide a platform for visibility to allow those doing them to later become TC members. Solutions to both of these issues have been suggested involving changing the size of the TC. If we decrease the size of the TC, it becomes less important to provide mechanisms for new people to develop reputation over the entire project, but this ends up concentrating the work of the TC to a smaller number of hands, and likely reduces the volume of work overall accomplished. If we increase the size of the TC, it becomes less burdensome for the TC to take on these activities, but this ends up foundering against the question of who in our community has sufficient experience with all aspects of OpenStack to fill the remaining seats (and how to maintain a suitable set of folk to provide TC continuity). If we instead simply assert that the TC is explicitly not responsible for any activities beyond governance approvals, we both reduce the impact that being elected to the TC has on the ability of our most prolific contributors to continue their activities and provide a means for folk who have expressed interest and initiative to broadly contribute and demonstrate their suitability for nomination in a future TC election Feedback encouraged -- Emmet HIKORY From sfinucan at redhat.com Fri May 3 20:54:12 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 14:54:12 -0600 Subject: Retiring bilean Message-ID: The Bilean appears to be dead and has had no activity in over two years. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-bilean. Stephen From sfinucan at redhat.com Fri May 3 20:55:15 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 14:55:15 -0600 Subject: Retiring hurricane Message-ID: <6f80aca07d72dd16e190c4396a15cdca39724b72.camel@redhat.com> The x/hurricane repo was created but has not been populated in the two years since. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-hurricane. Stephen From sfinucan at redhat.com Fri May 3 20:56:24 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 14:56:24 -0600 Subject: Retiring ailuropoda Message-ID: <07e22626baf782deb7cbedddafadf9b655612594.camel@redhat.com> The ailuropoda project appears to be dead and has had no activity in nearly three years. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-ailuropoda. Stephen From cdent+os at anticdent.org Fri May 3 21:13:38 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Fri, 3 May 2019 15:13:38 -0600 (MDT) Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> Message-ID: On Fri, 3 May 2019, Eric Fried wrote: >> Another alternative is having a "closure table" from where we can >> retrieve all the descendent rp ids of an rp without joining tables. >> but... online migration cost? > > Can we consider these optimizations later, if the python-side solution > proves non-performant? My preference would be that we start with the simplest option (make multiple selects, merge them appropriately in Python) and, as Eric says, if that's not good enough, pursue the optimizations. In fact, I think we should likely pursue the optimizations [1] in any case, but they should come _after_ we have some measurements. Jay provided a proposed algorithm in [2]. We have a time slot tomorrow (Saturday May 3) at 13:30 to discuss some of the finer points of implementing nested magic [3]. [1] Making placement faster is constantly a goal, but it is a secondary goal. [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005432.html [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005823.html -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From sfinucan at redhat.com Fri May 3 21:17:43 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 15:17:43 -0600 Subject: Retiring aeromancer Message-ID: <7c289216c6d1079d2c7d9c4c03b3740ebf5a5339.camel@redhat.com> The aeromancer project appears to be dead and has had no activity in over four years. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-aeromancer. Stephen From openstack at fried.cc Fri May 3 21:20:07 2019 From: openstack at fried.cc (Eric Fried) Date: Fri, 3 May 2019 15:20:07 -0600 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band Message-ID: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> Summary: When a port is deleted out of band (while still attached to an instance), any associated QoS bandwidth resources are orphaned in placement. Consensus: - Neutron to block deleting a port whose "owner" field is set. - If you really want to do this, null the "owner" field first. - Nova still needs a way to delete the port during destroy. To be discussed. Possibilities: - Nova can null the "owner" field first. - The operation can be permitted with a certain policy role, which Nova would have to be granted. - Other? efried . From sbauza at redhat.com Fri May 3 21:34:55 2019 From: sbauza at redhat.com (Sylvain Bauza) Date: Fri, 3 May 2019 15:34:55 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> Message-ID: On Fri, May 3, 2019 at 3:19 PM Chris Dent wrote: > On Fri, 3 May 2019, Eric Fried wrote: > > >> Another alternative is having a "closure table" from where we can > >> retrieve all the descendent rp ids of an rp without joining tables. > >> but... online migration cost? > > > > Can we consider these optimizations later, if the python-side solution > > proves non-performant? > > My preference would be that we start with the simplest option (make > multiple selects, merge them appropriately in Python) and, as Eric > says, if that's not good enough, pursue the optimizations. > > In fact, I think we should likely pursue the optimizations [1] in > any case, but they should come _after_ we have some measurements. > > Jay provided a proposed algorithm in [2]. > > That plan looks good to me, with the slight detail that I want to reinforce the fact that python usage will have a cost anyway, which is to drift us from the perfect world of having a distributed transactional model for free. This is to say, we should refrain *at the maximum* any attempt to get rid of SQL and use Python (or other tools) until we get a solid consensus on those tools being as efficient and as accurately possible than the current situation. We have a time slot tomorrow (Saturday May 3) at 13:30 to discuss > some of the finer points of implementing nested magic [3]. > > I'll try to be present. -Sylvain > [1] Making placement faster is constantly a goal, but it is a > secondary goal. > > [2] > http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005432.html > > [3] > http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005823.html > > -- > Chris Dent ٩◔̯◔۶ https://anticdent.org/ > freenode: cdent tw: @anticdent -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at ericsson.com Fri May 3 21:35:23 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 3 May 2019 21:35:23 +0000 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> Message-ID: <1556919312.16566.2@smtp.office365.com> On Fri, May 3, 2019 at 3:20 PM, Eric Fried wrote: > Summary: When a port is deleted out of band (while still attached to > an > instance), any associated QoS bandwidth resources are orphaned in > placement. > > Consensus: > - Neutron to block deleting a port whose "owner" field is set. > - If you really want to do this, null the "owner" field first. > - Nova still needs a way to delete the port during destroy. To be > discussed. Possibilities: > - Nova can null the "owner" field first. > - The operation can be permitted with a certain policy role, which > Nova would have to be granted. > - Other? Two additions: 1) Nova will log an ERROR when the leak happens. (Nova knows the port_id and the RP UUID but doesn't know the size of the allocation to remove it). This logging can be added today. 2) Matt had a point after the session that if Neutron enforces that only unbound port can be deleted then not only Nova needs to be changed to unbound a port before delete it, but possibly other Neutron consumers (Octavia?). Cheers, gibi > efried > . > From sfinucan at redhat.com Fri May 3 21:37:23 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 15:37:23 -0600 Subject: Retiring mors Message-ID: <8d05d9032e98dca63ff7c00b0b3b43e86f4a367f.camel@redhat.com> The aeromancer project appears to be dead and has had no activity in nearly two years. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-mors. Stephen From sfinucan at redhat.com Fri May 3 21:42:49 2019 From: sfinucan at redhat.com (Stephen Finucane) Date: Fri, 03 May 2019 15:42:49 -0600 Subject: Retiring alexandria Message-ID: The alexandria project appears to be dead and has had no activity in over three years. I would like to retire the repository. Please let me know if there are any objections. I'm proposing patches now with topic retire-alexandria. Stephen From doug at doughellmann.com Fri May 3 22:12:17 2019 From: doug at doughellmann.com (Doug Hellmann) Date: Fri, 03 May 2019 16:12:17 -0600 Subject: Retiring aeromancer In-Reply-To: <7c289216c6d1079d2c7d9c4c03b3740ebf5a5339.camel@redhat.com> References: <7c289216c6d1079d2c7d9c4c03b3740ebf5a5339.camel@redhat.com> Message-ID: Stephen Finucane writes: > The aeromancer project appears to be dead and has had no activity in > over four years. I would like to retire the repository. Please let > me know if there are any objections. > > I'm proposing patches now with topic retire-aeromancer. > > Stephen > > That one was mine. Go right ahead. I recommend that folks look at beagle [1] if they want a command line tool for submitting searches to codesearch.openstack.org. [1] https://pypi.org/project/beagle/ -- Doug From mriedemos at gmail.com Fri May 3 22:10:46 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Fri, 3 May 2019 16:10:46 -0600 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <1556919312.16566.2@smtp.office365.com> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> Message-ID: <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> On 5/3/2019 3:35 PM, Balázs Gibizer wrote: > 2) Matt had a point after the session that if Neutron enforces that > only unbound port can be deleted then not only Nova needs to be changed > to unbound a port before delete it, but possibly other Neutron > consumers (Octavia?). And potentially Zun, there might be others, Magnum, Heat, idk? Anyway, this is a thing that has been around forever which admins shouldn't do, do we need to prioritize making this change in both neutron and nova to make two requests to delete a bound port? Or is just logging the ERROR that you've leaked allocations, tsk tsk, enough? I tend to think the latter is fine until someone comes along saying this is really hurting them and they have a valid use case for deleting bound ports out of band from nova. -- Thanks, Matt From smooney at redhat.com Fri May 3 22:19:48 2019 From: smooney at redhat.com (Sean Mooney) Date: Fri, 3 May 2019 23:19:48 +0100 Subject: Neutron - Nova cross project topics Message-ID: https://etherpad.openstack.org/p/ptg-train-xproj-nova-neutron PTG summary: below is a summary of the section i lead for the cross project sessions. hopefully the others can extend this with there sections too. Topic: Optional NUMA affinity for neutron ports (sean-k-mooney) Summary: we will model numa affinity of neutron port via a new qos rule type that will be applied to the port. neutron will comunicate the policy to nova allowing different policies per interface. the numa polices will be defined in the spec and willl likely just be the ones we support already today in the the pci alias. AR: sean-k-mooney to write sibling specs for nova and neutron Topic: track neutron ports in placement. Summary: nova will create RPs for each SR-IOV PF and apply the nic feature flags and physnet as traits. the RP name will contain the PF netdev name. neutron l2 agents will add inventoies of ports under exsiting agent RPs. this will allow us to track the capsity of each network backend as well as schdule based on nic feature flag, vnic type and physnets. details will be worked out in the specs and it will target the U cycle AR: sean-k-mooney to write sibling specs for nova and neutron for U Topic: port binding records https://review.openstack.org/#/c/645173/ Summary: os-vif will be extended to contain new fields to record the conectiviy type and ml2 driver that bound the vif. each neutron ml2 driver will be modified to add a serialised os-vif object to the binding responce. the nova.network.model.vif object will be extended to store the os-vif object. the virt drivers will conditionnall skip calling the nova vif to os-vif vif object conversion function and fall back to the legacy workflow if not present in the nova vif object. initially none of the legacy code willl be removed untill all ml2 drivers are updated. AR: Sean and Rodolfo to update spec and with nova spec for nova specific changes. Topic: boot vms with unaddressed port. https://blueprints.launchpad.net/nova/+spec/boot-vm-with-unaddressed-port Summary: Agreed we should do this and we should depend on the port binding records change. AR: rodlofo to start codeing this up and update the spec. From balazs.gibizer at ericsson.com Fri May 3 22:37:48 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Fri, 3 May 2019 22:37:48 +0000 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <1556919312.16566.2@smtp.office365.com> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> Message-ID: <1556923057.16566.3@smtp.office365.com> > > 1) Nova will log an ERROR when the leak happens. (Nova knows the > port_id and the RP UUID but doesn't know the size of the allocation to > remove it). This logging can be added today. Path is up with an ERROR log: https://review.opendev.org/#/c/657079/ gibi From aspiers at suse.com Fri May 3 23:05:25 2019 From: aspiers at suse.com (Adam Spiers) Date: Fri, 3 May 2019 17:05:25 -0600 Subject: [tc][all][airship] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: <20190503190538.GB3377@localhost.localdomain> References: <20190503190538.GB3377@localhost.localdomain> Message-ID: <20190503230525.a3vxsnliklitnei4@arabian.linksys.moosehall> Paul Belanger wrote: >On Fri, May 03, 2019 at 08:48:10PM +0200, Roman Gorshunov wrote: >>Hello Jim, team, >> >>I'm from Airship project. I agree with archival of Github mirrors of >>repositories. Which mirror repositories are you referring to here - a subset of the Airship repos which are no longer needed, or all Airship repo mirrors? I would prefer the majority of the mirrors not to be archived, for two reasons which Alan or maybe Matt noted in the Airship discussions this morning: 1. Some people instinctively go to GitHub search when they want to find a software project. Having useful search results for "airship" on GitHub increases the discoverability of the project. 2. Some people will judge the liveness of a project by its activity metrics as shown on GitHub (e.g. number of recent commits). An active mirror helps show that the project is alive and well. In contrast, an archived mirror makes it look like the project is dead. However if you are only talking about a small subset which are no longer needed, then archiving sounds reasonable. >>One small suggestion: could we have project descriptions >>adjusted to point to the new location of the source code repository, >>please? E.g. "The repo now lives at opendev.org/x/y". I agree it's helpful if the top-level README.rst has a sentence like "the authoritative location for this repo is https://...". >This is something important to keep in mind from infra side, once the >repo is read-only, we lose the ability to use the API to change it. > >From manage-projects.py POV, we can update the description before >flipping the archive bit without issues, just need to make sure we have >the ordering correct. > >Also, there is no API to unarchive a repo from github sadly, for that a >human needs to log into github UI and click the button. I have no idea >why. Good points, but unless we're talking about a small subset of Airship repos, I'm a bit puzzled why this is being discussed, because I thought we reached consensus this morning on a) ensuring that all Airship projects are continually mirrored to GitHub, and b) trying to transfer those mirrors from the "openstack" organization to the "airship" one, assuming we can first persuade GitHub to kick out the org-squatters. This transferral would mean that GitHub would automatically redirect requests from https://github.com/openstack/airship-* to https://github.com/airship/... Consensus is documented in lines 107-112 of: https://etherpad.openstack.org/p/airship-ptg-train From johnsomor at gmail.com Fri May 3 23:05:49 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Fri, 3 May 2019 17:05:49 -0600 Subject: [octavia] Error while creating amphora In-Reply-To: <867dde2f-83ca-63ce-5ee7-bfa962ff46aa@gmx.com> References: <867dde2f-83ca-63ce-5ee7-bfa962ff46aa@gmx.com> Message-ID: Yes, with this setting to False, you will use config driver, but it will not use the "user_data" section of data source. Michael On Fri, May 3, 2019 at 1:07 AM Volodymyr Litovka wrote: > > Hi Michael, > > the reason is my personal perception that file injection is quite legacy > way and I even didn't know whether it enabed or no in my installation > :-) When configdrive is available, I'd prefer to use it in every case. > > I set "user_data_config_drive" to False and passed this step. Thanks for > pointing on this. > > Now working with next issues launching amphorae, will back soon :-) > > Thank you. > > On 5/2/19 5:58 PM, Michael Johnson wrote: > > Volodymyr, > > > > It looks like you have enabled "user_data_config_drive" in the > > octavia.conf file. Is there a reason you need this? If not, please > > set it to False and it will resolve your issue. > > > > It appears we have a python3 bug in the "user_data_config_drive" > > capability. It is not generally used and appears to be missing test > > coverage. > > > > I have opened a story (bug) on your behalf here: > > https://storyboard.openstack.org/#!/story/2005553 > > > > Michael > > > > On Thu, May 2, 2019 at 4:29 AM Volodymyr Litovka wrote: > >> Dear colleagues, > >> > >> I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: > >> > >> DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 > >> ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes > >> ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute > >> ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config > >> ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render > >> ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render > >> ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception > >> ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise > >> ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code > >> ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent > >> ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method > >> ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes > >> ERROR octavia.controller.worker.tasks.compute_tasks > >> WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' > >> > >> Any advises where is the problem? > >> > >> My environment: > >> - Openstack Rocky > >> - Ubuntu 18.04 > >> - Octavia installed in virtualenv using pip install: > >> # pip list |grep octavia > >> octavia 4.0.0 > >> octavia-lib 1.1.1 > >> python-octaviaclient 1.8.0 > >> > >> Thank you. > >> > >> -- > >> Volodymyr Litovka > >> "Vision without Execution is Hallucination." -- Thomas Edison > >> > >> -- > >> Volodymyr Litovka > >> "Vision without Execution is Hallucination." -- Thomas Edison > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > From amotoki at gmail.com Fri May 3 23:22:46 2019 From: amotoki at gmail.com (Akihiro Motoki) Date: Fri, 3 May 2019 17:22:46 -0600 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> Message-ID: On Fri, May 3, 2019 at 4:11 PM Matt Riedemann wrote: > On 5/3/2019 3:35 PM, Balázs Gibizer wrote: > > 2) Matt had a point after the session that if Neutron enforces that > > only unbound port can be deleted then not only Nova needs to be changed > > to unbound a port before delete it, but possibly other Neutron > > consumers (Octavia?). > > And potentially Zun, there might be others, Magnum, Heat, idk? > > Anyway, this is a thing that has been around forever which admins > shouldn't do, do we need to prioritize making this change in both > neutron and nova to make two requests to delete a bound port? Or is just > logging the ERROR that you've leaked allocations, tsk tsk, enough? I > tend to think the latter is fine until someone comes along saying this > is really hurting them and they have a valid use case for deleting bound > ports out of band from nova. > neutron deines a special role called "advsvc" for advanced network services [1]. I think we can change neutron to block deletion of bound ports for regular users and allow users with "advsvc" role to delete bound ports. I haven't checked which projects currently use "advsvc". [1] https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/policies/port.py#L53-L59 > > -- > > Thanks, > > Matt > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhipengh512 at gmail.com Sat May 4 00:58:38 2019 From: zhipengh512 at gmail.com (Zhipeng Huang) Date: Sat, 4 May 2019 08:58:38 +0800 Subject: [tc] Proposal: restrict TC activities In-Reply-To: <20190503204942.GB28010@shipstone.jp> References: <20190503204942.GB28010@shipstone.jp> Message-ID: Then it might fit the purpose to rename the technical committee to governance committee or other terms. If we have a technical committee not investing time to lead in technical evolvement of OpenStack, it just seems odd to me. TC should be a place good developers aspired to, not retired to. BTW this is not a OpenStack-only issue but I see across multiple open source communities. On Sat, May 4, 2019 at 4:51 AM Emmet Hikory wrote: > All, > I’ve spent the last few years watching the activities of the > technical committee , and in recent cycles, I’m seeing a significant > increase in both members of our community asking the TC to take action > on things, and the TC volunteering to take action on things in the > course of internal discussions (meetings, #openstack-tc, etc.). In > combination, these trends appear to have significantly increased the > amount of time that members of the technical committee spend on “TC > work”, and decreased the time that they spend on other activities in > OpenStack. As such, I suggest that the Technical Committee be > restricted from actually doing anything beyond approval of merges to the > governance repository. > > Firstly, we select members of the technical committee from amongst > those of us who have some of the deepest understanding of the entire > project and frequently those actively involved in multiple projects and > engaged in cross-project coordination on a regular basis. Anything less > than this fails to produce enough name recognition for election. As > such, when asking the TC to be responsible for activities, we should > equally ask whether we wish the very people responsible for the > efficiency of our collaboration to cease doing so in favor of whatever > we may have assigned to the TC. > > Secondly, in order to ensure continuity, we need to provide a means > for rotation of the TC: this is both to allow folk on the TC to pursue > other activities, and to allow folk not on the TC to join the TC and > help with governance and coordination. If we wish to increase the > number of folk who might be eligible for the TC, we do this best by > encouraging them to take on activities that involve many projects or > affect activities over all of OpenStack. These activities must > necessarily be taken by those not current TC members in order to provide > a platform for visibility to allow those doing them to later become TC > members. > > Solutions to both of these issues have been suggested involving > changing the size of the TC. If we decrease the size of the TC, it > becomes less important to provide mechanisms for new people to develop > reputation over the entire project, but this ends up concentrating the > work of the TC to a smaller number of hands, and likely reduces the > volume of work overall accomplished. If we increase the size of the TC, > it becomes less burdensome for the TC to take on these activities, but > this ends up foundering against the question of who in our community has > sufficient experience with all aspects of OpenStack to fill the > remaining seats (and how to maintain a suitable set of folk to provide > TC continuity). > > If we instead simply assert that the TC is explicitly not > responsible for any activities beyond governance approvals, we both > reduce the impact that being elected to the TC has on the ability of our > most prolific contributors to continue their activities and provide a > means for folk who have expressed interest and initiative to broadly > contribute and demonstrate their suitability for nomination in a future > TC election > > Feedback encouraged > > -- > Emmet HIKORY > > > -- Zhipeng (Howard) Huang Principle Engineer OpenStack, Kubernetes, CNCF, LF Edge, ONNX, Kubeflow, OpenSDS, Open Service Broker API, OCP, Hyperledger, ETSI, SNIA, DMTF, W3C -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.settle at outlook.com Sat May 4 03:35:38 2019 From: a.settle at outlook.com (Alexandra Settle) Date: Sat, 4 May 2019 03:35:38 +0000 Subject: [nova][ptg] Summary: docs In-Reply-To: References: Message-ID: I know you have Stephen on the team, but let me know if the team also wants to look further into formalising the information architecture and help reviewing patches. Cheers, Alex Get Outlook for Android ________________________________ From: Eric Fried Sent: Thursday, May 2, 2019 10:14:34 PM To: OpenStack Discuss Subject: [nova][ptg] Summary: docs Summary: Nova docs could use some love. Agreement: Consider doc scrub as a mini-theme (cycle themes to be discussed Saturday) to encourage folks to dedicate some amount of time to reading & validating docs, and opening and/or fixing bugs for discovered issues. efried . -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongbin034 at gmail.com Sat May 4 05:12:38 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Sat, 4 May 2019 01:12:38 -0400 Subject: [ZUN] Proxy on Docker + Zun In-Reply-To: References: Message-ID: Alejandro, Yes, it might be an proxy issue. According to https://docs.docker.com/config/daemon/systemd/#httphttps-proxy , the NO_PROXY is a list of comma-separated hosts (not a cidr like 10.8.0.0/16 ). So you might want to try: NO_PROXY=localhost,127.0.0.1,10.8.9. 54,... On Fri, May 3, 2019 at 3:09 PM Alejandro Ruiz Bermejo < arbermejo0417 at gmail.com> wrote: > I'm still working on my previous error of the openstack appcontainer run > error state: > > I have Docker working behind a Proxy. As you can see in the Docker info i > attach to this mail. I tried to do the curl http://10.8.9.54:2379/health > with the proxy environment variable and i got timeout error (without it the > curl return the normal healthy state for the etcd cluster). So my question > is if i'm having a problem with the proxy configuration and docker commands > when i'm executing the openstack appcontainer run. And if you know any use > case of someone working with Docker behind a proxy and Zun in the Openstack > environment. > > This is the outputh of > > # systemctl show --property Environment docker > Environment=HTTP_PROXY=http://10.8.7.60:3128/ NO_PROXY=localhost, > 127.0.0.0/8,10.8.0.0/16 HTTPS_PROXY=http://10.8.7.60:3128/ > > And this is the one of > > root at compute /h/team# docker info > Containers: 9 > Running: 0 > Paused: 0 > Stopped: 9 > Images: 7 > Server Version: 18.09.5 > Storage Driver: overlay2 > Backing Filesystem: extfs > Supports d_type: true > Native Overlay Diff: true > Logging Driver: json-file > Cgroup Driver: cgroupfs > Plugins: > Volume: local > Network: bridge host macvlan null overlay > Log: awslogs fluentd gcplogs gelf journald json-file local logentries > splunk syslog > Swarm: inactive > Runtimes: runc > Default Runtime: runc > Init Binary: docker-init > containerd version: bb71b10fd8f58240ca47fbb579b9d1028eea7c84 > runc version: 2b18fe1d885ee5083ef9f0838fee39b62d653e30 > init version: fec3683 > Security Options: > apparmor > seccomp > Profile: default > Kernel Version: 4.15.0-48-generic > Operating System: Ubuntu 18.04.2 LTS > OSType: linux > Architecture: x86_64 > CPUs: 8 > Total Memory: 15.66GiB > Name: compute > ID: W35H:WCPP:AM3K:NENH:FEOR:S23C:N3FZ:QELB:LLUR:USMJ:IM7W:YMFX > Docker Root Dir: /var/lib/docker > Debug Mode (client): false > Debug Mode (server): false > HTTP Proxy: http://10.8.7.60:3128/ > HTTPS Proxy: http://10.8.7.60:3128/ > No Proxy: localhost,127.0.0.0/8,10.8.0.0/16 > Registry: https://index.docker.io/v1/ > Labels: > Experimental: false > Cluster Store: etcd://10.8.9.54:2379 > Insecure Registries: > 127.0.0.0/8 > Live Restore Enabled: false > Product License: Community Engine > > WARNING: API is accessible on http://compute:2375 without encryption. > Access to the remote API is equivalent to root access on the > host. Refer > to the 'Docker daemon attack surface' section in the > documentation for > more information: > https://docs.docker.com/engine/security/security/#docker-daemon-attack-surface > WARNING: No swap limit support > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From persia at shipstone.jp Sat May 4 13:25:50 2019 From: persia at shipstone.jp (Emmet Hikory) Date: Sat, 4 May 2019 22:25:50 +0900 Subject: [tc] Proposal: restrict TC activities In-Reply-To: References: <20190503204942.GB28010@shipstone.jp> Message-ID: <20190504132550.GA28713@shipstone.jp> Zhipeng Huang wrote: > Then it might fit the purpose to rename the technical committee to > governance committee or other terms. If we have a technical committee not > investing time to lead in technical evolvement of OpenStack, it just seems > odd to me. OpenStack has a rich governance structure, including at least the Foundation Board, the User Committee, and the Technical Committee. Within the context of governance, the Technical Committee is responsible for both technical governance of OpenStack and governance of the technical community. It is within that context that "Technical Committee" is the name. I also agree that it is important that members of the Technical Committee are able to invest time to lead in the technical evolution of OpenStack, and this is a significant reason that I propose that the activities of the TC be restricted, precisely so that being elected does not mean that one no longer is able to invest time for this. > TC should be a place good developers aspired to, not retired to. BTW this > is not a OpenStack-only issue but I see across multiple open source > communities. While I agree that it is valuable to have a target for the aspirations of good developers, I am not convinced that OpenStack can be healthy if we restrict our aspirations to nine seats. From my perspective, this causes enough competition that many excellent folk may never be elected, and that some who wish to see their aspirations fufilled may focus activity in other communities where it may be easier to achieve an arbitrary title. Instead, I suggest that developers should aspire to be leaders in the OpenStack comunuity, and be actively involved in determining the future technical direction of OpenStack. I just don't think there needs to be any correlation between this and the mechanics of reviewing changes to the governance repository. -- Emmet HIKORY From jfrancoa at redhat.com Sat May 4 16:02:03 2019 From: jfrancoa at redhat.com (Jose Luis Franco Arza) Date: Sat, 4 May 2019 18:02:03 +0200 Subject: =?UTF-8?Q?Re=3A_=5Btripleo=5D_Nominate_C=C3=A9dric_Jeanneret_=28Tengu=29_for?= =?UTF-8?Q?_tripleo=2Dvalidations_core?= In-Reply-To: <20190418102939.heykaeyphydgocq4@olivia.strider.local> References: <20190418102939.heykaeyphydgocq4@olivia.strider.local> Message-ID: +1 On Thu, Apr 18, 2019 at 12:32 PM Gaël Chamoulaud wrote: > Hi TripleO devs, > > The new Validation Framework is a big step further for the > tripleo-validations > project. In many ways, it improves the way of detecting & reporting > potential > issues during a TripleO deployment. As the mastermind of this new > framework, > Cédric brought a new lease of life to the tripleo-validations project. > That's > why we would highly benefit from his addition to the core reviewer team. > > Assuming that there are no objections, we will add Cédric to the core team > next > week. > > Thanks, Cédric, for your excellent work! > > =Gaël > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Sat May 4 16:25:08 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Sat, 4 May 2019 10:25:08 -0600 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> Message-ID: I think this will have implications for Octavia, but we can work through those. There are cases during cleanup from an error where we delete ports owned by "Octavia" that have not yet be attached to a nova instance. My understanding of the above discussion is that this would not be an issue under this change. However.... We also, currently, manipulate the ports we have hot-plugged (attached) to nova instances where the port "device_owner" has become "compute:nova", mostly for failover scenarios and cases where nova detach fails and we have to revert the action. Now, if the "proper" new procedure is to first detach before deleting the port, we can look at attempting that. But, in the common failure scenarios we see nova failing to complete this, if for example the compute host has been powered off. In this scenario we still need to delete the neutron port for both resource cleanup and quota reasons. This so we can create a new port and attach it to a new instance to recover. I think this change will impact our current port manage flows, so we should proceed cautiously, test heavily, and potentially address some of the nova failure scenarios at the same time. Michael On Fri, May 3, 2019 at 5:23 PM Akihiro Motoki wrote: > > > > On Fri, May 3, 2019 at 4:11 PM Matt Riedemann wrote: >> >> On 5/3/2019 3:35 PM, Balázs Gibizer wrote: >> > 2) Matt had a point after the session that if Neutron enforces that >> > only unbound port can be deleted then not only Nova needs to be changed >> > to unbound a port before delete it, but possibly other Neutron >> > consumers (Octavia?). >> >> And potentially Zun, there might be others, Magnum, Heat, idk? >> >> Anyway, this is a thing that has been around forever which admins >> shouldn't do, do we need to prioritize making this change in both >> neutron and nova to make two requests to delete a bound port? Or is just >> logging the ERROR that you've leaked allocations, tsk tsk, enough? I >> tend to think the latter is fine until someone comes along saying this >> is really hurting them and they have a valid use case for deleting bound >> ports out of band from nova. > > > neutron deines a special role called "advsvc" for advanced network services [1]. > I think we can change neutron to block deletion of bound ports for regular users and > allow users with "advsvc" role to delete bound ports. > I haven't checked which projects currently use "advsvc". > > [1] https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/policies/port.py#L53-L59 > >> >> >> -- >> >> Thanks, >> >> Matt >> From emilien at redhat.com Sat May 4 16:27:34 2019 From: emilien at redhat.com (Emilien Macchi) Date: Sat, 4 May 2019 10:27:34 -0600 Subject: =?UTF-8?Q?Re=3A_=5Btripleo=5D_Nominate_C=C3=A9dric_Jeanneret_=28Tengu=29_for?= =?UTF-8?Q?_tripleo=2Dvalidations_core?= In-Reply-To: References: <20190418102939.heykaeyphydgocq4@olivia.strider.local> Message-ID: I went ahead and added Cédric to the list of TripleO core (there is no tripleo-validation group in Gerrit). On Sat, May 4, 2019 at 10:13 AM Jose Luis Franco Arza wrote: > +1 > > On Thu, Apr 18, 2019 at 12:32 PM Gaël Chamoulaud > wrote: > >> Hi TripleO devs, >> >> The new Validation Framework is a big step further for the >> tripleo-validations >> project. In many ways, it improves the way of detecting & reporting >> potential >> issues during a TripleO deployment. As the mastermind of this new >> framework, >> Cédric brought a new lease of life to the tripleo-validations project. >> That's >> why we would highly benefit from his addition to the core reviewer team. >> >> Assuming that there are no objections, we will add Cédric to the core >> team next >> week. >> >> Thanks, Cédric, for your excellent work! >> >> =Gaël >> > -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Sat May 4 16:55:35 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sat, 4 May 2019 10:55:35 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Testing PlacementFixture effectively In-Reply-To: References: Message-ID: On Fri, 3 May 2019, Chris Dent wrote: > Action: > > cdent will make a story and do this https://review.opendev.org/#/q/topic:story/2005562 -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From balazs.gibizer at ericsson.com Sat May 4 16:57:37 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Sat, 4 May 2019 16:57:37 +0000 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> Message-ID: <1556989044.27606.0@smtp.office365.com> On Sat, May 4, 2019 at 10:25 AM, Michael Johnson wrote: > I think this will have implications for Octavia, but we can work > through those. > > There are cases during cleanup from an error where we delete ports > owned by "Octavia" that have not yet be attached to a nova instance. > My understanding of the above discussion is that this would not be an > issue under this change. If the port is owned by Octavia then the resource leak does not happen. However the propose neutron code / policy change affects this case as well. > > However.... > > We also, currently, manipulate the ports we have hot-plugged > (attached) to nova instances where the port "device_owner" has become > "compute:nova", mostly for failover scenarios and cases where nova > detach fails and we have to revert the action. > > Now, if the "proper" new procedure is to first detach before deleting > the port, we can look at attempting that. But, in the common failure > scenarios we see nova failing to complete this, if for example the > compute host has been powered off. In this scenario we still need to > delete the neutron port for both resource cleanup and quota reasons. > This so we can create a new port and attach it to a new instance to > recover. If Octavai also deletes the VM then force deleting the port is OK from placement resource prespective as the VM delete will make sure we are deleting the leaked port resources. > > I think this change will impact our current port manage flows, so we > should proceed cautiously, test heavily, and potentially address some > of the nova failure scenarios at the same time. After talking to rm_work on #openstack-nova [1] it feels that the policy based solution would work for Octavia. So Octavia with the extra policy can still delete the bound port in Neutron safely as Ocatavia also deletes the VM that the port was bound to. That VM delete will reclaim the leaked port resource. The failure to detach a port via nova while the nova-compute is down could be a bug on nova side. cheers, gibi [1] http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2019-05-04.log.html#t2019-05-04T16:15:52 > > Michael > > On Fri, May 3, 2019 at 5:23 PM Akihiro Motoki > wrote: >> >> >> >> On Fri, May 3, 2019 at 4:11 PM Matt Riedemann >> wrote: >>> >>> On 5/3/2019 3:35 PM, Balázs Gibizer wrote: >>> > 2) Matt had a point after the session that if Neutron enforces >>> that >>> > only unbound port can be deleted then not only Nova needs to be >>> changed >>> > to unbound a port before delete it, but possibly other Neutron >>> > consumers (Octavia?). >>> >>> And potentially Zun, there might be others, Magnum, Heat, idk? >>> >>> Anyway, this is a thing that has been around forever which admins >>> shouldn't do, do we need to prioritize making this change in both >>> neutron and nova to make two requests to delete a bound port? Or >>> is just >>> logging the ERROR that you've leaked allocations, tsk tsk, enough? >>> I >>> tend to think the latter is fine until someone comes along saying >>> this >>> is really hurting them and they have a valid use case for deleting >>> bound >>> ports out of band from nova. >> >> >> neutron deines a special role called "advsvc" for advanced network >> services [1]. >> I think we can change neutron to block deletion of bound ports for >> regular users and >> allow users with "advsvc" role to delete bound ports. >> I haven't checked which projects currently use "advsvc". >> >> [1] >> https://protect2.fireeye.com/url?k=e82c8753-b4a78c60-e82cc7c8-865bb277df6a-a57d1b5660e0038e&u=https://opendev.org/openstack/neutron/src/branch/master/neutron/conf/policies/port.py#L53-L59 >> >>> >>> >>> -- >>> >>> Thanks, >>> >>> Matt >>> > From cdent+os at anticdent.org Sat May 4 18:09:49 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sat, 4 May 2019 12:09:49 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Nested Magic With Placement In-Reply-To: References: Message-ID: On Fri, 3 May 2019, Chris Dent wrote: > * This (Friday) afternoon at the PTG I'll be creating rfe stories > associated with these changes. If you'd like to help with that, find > me in the placement room (109). We'll work out whether those > stories needs specs in the normally processing of the stories. > We'll also need to find owners for many of them. I decided to capture all of this in one story: https://storyboard.openstack.org/#!/story/2005575 which will likely need to be broken into several stories, or at least several detailed tasks. We will also need to determine what of it needs a spec (there's already one in progress for the request group mapping), if one spec will be sufficient, or we can get away without one. And people. Always with the people. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From tetsuro.nakamura.bc at hco.ntt.co.jp Sat May 4 18:40:04 2019 From: tetsuro.nakamura.bc at hco.ntt.co.jp (Tetsuro Nakamura) Date: Sun, 05 May 2019 03:40:04 +0900 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: References: <776bc9b18cf33713708c22d893bd2a46d7a899ed.camel@redhat.com> <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> Message-ID: <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> Okay, I was missing that at the point to merge each candidate from each request groups, all the rps info in the trees are already in ProviderSummaries, and we can use them without an additional query. It looks like that this can be done without impacting the performance of existing requests that have no queryparam for affinity, so I'm good with this and can volunteer it in Placement since this is more of general "subtree" thing, but I'd like to say that looking into tracking PCPU feature in Nova and see the related problems should precede any Nova related items to model NUMA in Placement. On 2019/05/04 0:03, Eric Fried wrote: >> It enables something like: >> * group_resources=1:2:!3:!4 >> which means 1 and 2 should be in the same group but 3 shoudn't be the >> descendents of 1 or 2, so as 4. > In a symmetric world, this one is a little ambiguous to me. Does it mean > 4 shouldn't be in the same subtree as 3 as well? I thought the negative folks were just refusing to be with in the positive folks. Looks like there are use cases where we need multiple group_resources? - I want 1, 2 in the same subtree, and 3, 4 in the same subtree but the two subtrees should be separated: * group_resources=1:2:!3:!4&group_resources=3:4 -- Tetsuro Nakamura NTT Network Service Systems Laboratories TEL:0422 59 6914(National)/+81 422 59 6914(International) 3-9-11, Midori-Cho Musashino-Shi, Tokyo 180-8585 Japan From kchamart at redhat.com Sat May 4 18:45:17 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Sat, 4 May 2019 20:45:17 +0200 Subject: [nova][ptg] Summary: Secure Boot support for QEMU- and KVM-based Nova instances Message-ID: <20190504184517.GF28897@paraplu> Spec: https://review.opendev.org/#/c/506720/ -- Add "Secure Boot support for KVM & QEMU guests" spec Summary: - Major work in all the lower-level dependencies: OVMF, QEMU and libvirt is ready. Nova can now start integrating this feature. (Refer to the spec for the details.) - [IN-PROGRESS] Ensure that the Linux distributions Nova cares about ship the OVMF firmware descriptor files. (Requires QEMU 4.1, coming out in August. Refer this QEMU patch series; merged in Git master: https://lists.nongnu.org/archive/html/qemu-devel/2019-04/msg03799.html bundle edk2 platform firmware with QEMU.) - NOTE: This is not a blocker for Nova. We can parallely hammer away at the work items outlined in the spec. - [IN-PROGRESS] Kashyap is working with Debian folks to ship a tool ('ovmf-vars-generator') to enroll default UEFI keys for Secure Boot. - Filed a Debian "RFP" for it https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=927414 - Fedora already ships it; Ubuntu is working on it (https://launchpad.net/ubuntu/+source/edk2/0~20190309.89910a39-1ubuntu1) - NOTE: This is not a blocker, but a nice-to-have, because distributions already ship an OVMF "VARS" (variable store file) with default UEFI keys enrolled. - ACTION: John Garbutt and Chris Friesen to review the Nova spec. (Thanks!) -- /kashyap From josephine.seifert at secustack.com Sat May 4 18:57:38 2019 From: josephine.seifert at secustack.com (Josephine Seifert) Date: Sat, 4 May 2019 20:57:38 +0200 Subject: [nova][cinder][glance][Barbican]Finding Timeslot for weekly Image Encryption IRC meeting Message-ID: Hello, as a result from the Summit and the PTG, I would like to hold a weekly IRC-meeting for the Image Encryption (soon to be a pop-up team).  As I work in Europe I have made a doodle poll, with timeslots I can attend and hopefully many of you. If you would like to join in a weekly meeting, please fill out the poll and state your name and the project you are working in: https://doodle.com/poll/wtg9ha3e5dvym6yt Thank you Josephine (Luzi) From sukhdevkapur at gmail.com Sat May 4 19:43:29 2019 From: sukhdevkapur at gmail.com (Sukhdev Kapur) Date: Sat, 4 May 2019 12:43:29 -0700 Subject: [ironic][neutron][ops] Ironic multi-tenant networking, VMs In-Reply-To: References: Message-ID: Jeremy, If you want to use VxLAN networks for the bremetal hosts, you would use ML2 VLAN networks, as Julia described, between the host and switch port. That VLAN will then terminate into a VTAP on the switch port which will carry appropriate tags in the VxLAN overlay. Hope this helps -Sukhdev On Thu, May 2, 2019 at 9:28 PM Jeremy Freudberg wrote: > Thanks Julia; this is helpful. > > Thanks also for reading my mind a bit, as I am thinking of the VXLAN > case... I can't help but notice that in the Ironic CI jobs, multi > tenant networking being used seems to entail VLANs as the tenant > network type (instead of VXLAN). Is it just coincidence / how the gate > just is, or is it hinting something about how VXLAN and bare metal get > along? > > On Wed, May 1, 2019 at 6:38 PM Julia Kreger > wrote: > > > > Greetings Jeremy, > > > > Best Practice wise, I'm not directly aware of any. It is largely going > > to depend upon your Neutron ML2 drivers and network fabric. > > > > In essence, you'll need an ML2 driver which supports the vnic type of > > "baremetal", which is able to able to orchestrate the switch port port > > binding configuration in your network fabric. If your using vlan > > networks, in essence you'll end up with a neutron physical network > > which is also a trunk port to the network fabric, and the ML2 driver > > would then appropriately tag the port(s) for the baremetal node to the > > networks required. In the CI gate, we do this in the "multitenant" > > jobs where networking-generic-switch modifies the OVS port > > configurations directly. > > > > If specifically vxlan is what your looking to use between VMs and > > baremetal nodes, I'm unsure of how you would actually configure that, > > but in essence the VXLANs would still need to be terminated on the > > switch port via the ML2 driver. > > > > In term of Ironic's documentation, If you haven't already seen it, you > > might want to check out ironic's multi-tenancy documentation[1]. > > > > -Julia > > > > [1]: https://docs.openstack.org/ironic/latest/admin/multitenancy.html > > > > On Wed, May 1, 2019 at 10:53 AM Jeremy Freudberg > > wrote: > > > > > > Hi all, > > > > > > I'm wondering if anyone has any best practices for Ironic bare metal > > > nodes and regular VMs living on the same network. I'm sure if involves > > > Ironic's `neutron` multi-tenant network driver, but I'm a bit hazy on > > > the rest of the details (still very much in the early stages of > > > exploring Ironic). Surely it's possible, but I haven't seen mention of > > > this anywhere (except the very old spec from 2015 about introducing > > > ML2 support into Ironic) nor is there a gate job resembling this > > > specific use. > > > > > > Ideas? > > > > > > Thanks, > > > Jeremy > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Sat May 4 20:02:12 2019 From: openstack at fried.cc (Eric Fried) Date: Sat, 4 May 2019 14:02:12 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> References: <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> Message-ID: > It looks like that this can be done without impacting the performance of > existing requests that have no queryparam for affinity, Well, the concern is that doing this at _merge_candidates time (i.e. in python) may be slow. But yeah, let's not solve that until/unless we see it's truly a problem. > but I'd like to say that looking into tracking PCPU feature in Nova and > see the related problems should precede any Nova related items to model > NUMA in Placement. To be clear, placement doesn't need any changes for this. I definitely don't think we should wait for it to land before starting on the placement side of the affinity work. > I thought the negative folks were just refusing to be with in the > positive folks. > Looks like there are use cases where we need multiple group_resources? Yes, certainly eventually we'll need this, even just for positive affinity. Example: I want two VCPUs, two chunks of memory, and two accelerators. Each VCPU/memory/accelerator combo must be affined to the same NUMA node so I can maximize the performance of the accelerator. But I don't care whether both combos come from the same or different NUMA nodes: ?resources_compute1=VCPU:1,MEMORY_MB:1024 &resources_accel1=FPGA:1 &same_subtree:compute1,accel1 &resources_compute2=VCPU:1,MEMORY_MB:1024 &resources_accel2=FPGA:1 &same_subtree:compute2,accel2 and what I want to get in return is: candidates: (1) NUMA1 has VCPU:1,MEMORY_MB:1024,FPGA:1; NUMA2 likewise (2) NUMA1 has everything (3) NUMA2 has everything Slight aside, could we do this with can_split and just one same_subtree? I'm not sure you could expect the intended result from: ?resources_compute=VCPU:2,MEMORY_MB:2048 &resources_accel=FPGA:2 &same_subtree:compute,accel &can_split:compute,accel Intuitively, I think the above *either* means you don't get (1), *or* it means you can get (1)-(3) *plus* things like: (4) NUMA1 has VCPU:2,MEMORY_MB:2048; NUMA2 has FPGA:2 > - I want 1, 2 in the same subtree, and 3, 4 in the same subtree but the > two subtrees should be separated: > > * group_resources=1:2:!3:!4&group_resources=3:4 Right, and this too. As a first pass, I would be fine with supporting only positive affinity. And if it makes things significantly easier, supporting only a single group_resources per call. efried . From dciabrin at redhat.com Sat May 4 21:14:50 2019 From: dciabrin at redhat.com (Damien Ciabrini) Date: Sat, 4 May 2019 23:14:50 +0200 Subject: [oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI In-Reply-To: <20190503175904.GA26117@holtby> References: <229a2a53-870f-44c3-5e0c-6cfa9d45d0c5@oracle.com> <3275304e-d717-8b89-557e-b650fc4f661a@oracle.com> <20190420063850.GA18527@holtby.speedport.ip> <8b9cb0e4-b3a4-986a-be59-5bba6ae00f4e@nemebean.com> <20190503175904.GA26117@holtby> Message-ID: On Fri, May 3, 2019 at 7:59 PM Michele Baldessari wrote: > On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote: > > > > > > On 4/22/19 12:53 PM, Alex Schultz wrote: > > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec > wrote: > > > > > > > > > > > > > > > > On 4/20/19 1:38 AM, Michele Baldessari wrote: > > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, > iain.macdonnell at oracle.com wrote: > > > > > > > > > > > > Today I discovered that this problem appears to be caused by > eventlet > > > > > > monkey-patching. I've created a bug for it: > > > > > > > > > > > > https://bugs.launchpad.net/nova/+bug/1825584 > > > > > > > > > > Hi, > > > > > > > > > > just for completeness we see this very same issue also with > > > > > mistral (actually it was the first service where we noticed the > missed > > > > > heartbeats). iirc Alex Schultz mentioned seeing it in ironic as > well, > > > > > although I have not personally observed it there yet. > > > > > > > > Is Mistral also mixing eventlet monkeypatching and WSGI? > > > > > > > > > > Looks like there is monkey patching, however we noticed it with the > > > engine/executor. So it's likely not just wsgi. I think I also saw it > > > in the ironic-conductor, though I'd have to try it out again. I'll > > > spin up an undercloud today and see if I can get a more complete list > > > of affected services. It was pretty easy to reproduce. > > > > Okay, I asked because if there's no WSGI/Eventlet combination then this > may > > be different from the Nova issue that prompted this thread. It sounds > like > > that was being caused by a bad interaction between WSGI and some Eventlet > > timers. If there's no WSGI involved then I wouldn't expect that to > happen. > > > > I guess we'll see what further investigation turns up, but based on the > > preliminary information there may be two bugs here. > > So just to get some closure on this error that we have seen around > mistral executor and tripleo with python3: this was due to the ansible > action that called subprocess which has a different implementation in > python3 and so the monkeypatching needs to be adapted. > > Review which fixes it for us is here: > https://review.opendev.org/#/c/656901/ > > Damien and I think the nova_api/eventlet/mod_wsgi has a separate root-cause > (although we have not spent all too much time on that one yet) > > Right, after further investigation, it appears that the problem we saw under mod_wsgi was due to monkey patching, as Iain originally reported. It has nothing to do with our work on healthchecks. It turns out that running the AMQP heartbeat thread under mod_wsgi doesn't work when the threading library is monkey_patched, because the thread waits on a data structure [1] that has been monkey patched [2], which makes it yield its execution instead of sleeping for 15s. Because mod_wsgi stops the execution of its embedded interpreter, the AMQP heartbeat thread can't be resumed until there's a message to be processed in the mod_wsgi queue, which would resume the python interpreter and make eventlet resume the thread. Disabling monkey-patching in nova_api makes the scheduling issue go away. Note: other services like heat-api do not use monkey patching and aren't affected, so this seem to confirm that monkey-patching shouldn't happen in nova_api running under mod_wsgi in the first place. [1] https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L904 [2] https://github.com/openstack/oslo.utils/blob/master/oslo_utils/eventletutils.py#L182 -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Sat May 4 22:43:26 2019 From: openstack at fried.cc (Eric Fried) Date: Sat, 4 May 2019 16:43:26 -0600 Subject: [nova][all][ptg] Summary: Same-Company Approvals Message-ID: (NB: I tagged [all] because it would be interesting to know where other teams stand on this issue.) Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance Summary: - There is a (currently unwritten? at least for Nova) rule that a patch should not be approved exclusively by cores from the same company. This is rife with nuance, including but not limited to: - Usually (but not always) relevant when the patch was proposed by member of same company - N/A for trivial things like typo fixes - The issue is: - Should the rule be abolished? and/or - Should the rule be written down? Consensus (not unanimous): - The rule should not be abolished. There are cases where both the impetus and the subject matter expertise for a patch all reside within one company. In such cases, at least one core from another company should still be engaged and provide a "procedural +2" - much like cores proxy SME +1s when there's no core with deep expertise. - If there is reasonable justification for bending the rules (e.g. typo fixes as noted above, some piece of work clearly not related to the company's interest, unwedging the gate, etc.) said justification should be clearly documented in review commentary. - The rule should not be documented (this email notwithstanding). This would either encourage loopholing or turn into a huge detailed legal tome that nobody will read. It would also *require* enforcement, which is difficult and awkward. Overall, we should be able to trust cores to act in good faith and in the appropriate spirit. efried . From openstack at fried.cc Sat May 4 22:56:34 2019 From: openstack at fried.cc (Eric Fried) Date: Sat, 4 May 2019 16:56:34 -0600 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: References: <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> Message-ID: <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> For those of you following along at home, we had a design session a couple of hours ago and hammered out the broad strokes of this work, including rough prioritization of the various pieces. Chris has updated the story [1] with a couple of notes; expect details and specs to emerge therefrom. efried [1] https://storyboard.openstack.org/#!/story/2005575 From openstack at fried.cc Sat May 4 23:32:02 2019 From: openstack at fried.cc (Eric Fried) Date: Sat, 4 May 2019 17:32:02 -0600 Subject: [nova][ptg] Summary/Outcome: Train Cycle Themes Message-ID: Etherpad: https://etherpad.openstack.org/p/nova-train-themes Summary: In Stein, we started doing cycle themes instead of priorities. The distinction being that themes should represent tangible user (as in OpenStack consumer) facing value, whereas priorities represent what work items we want to do. Outcome: We decided on themes around: (1) The use of placement (2) Cyborg integration (3) Docs I have curated the etherpad and discussions and proposed these themes to the nova-specs repository at https://review.opendev.org/657171 efried . From morgan.fainberg at gmail.com Sun May 5 01:19:48 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Sat, 4 May 2019 19:19:48 -0600 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: On Sat, May 4, 2019, 16:48 Eric Fried wrote: > (NB: I tagged [all] because it would be interesting to know where other > teams stand on this issue.) > > Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance > > Summary: > - There is a (currently unwritten? at least for Nova) rule that a patch > should not be approved exclusively by cores from the same company. This > is rife with nuance, including but not limited to: > - Usually (but not always) relevant when the patch was proposed by > member of same company > - N/A for trivial things like typo fixes > - The issue is: > - Should the rule be abolished? and/or > - Should the rule be written down? > > Consensus (not unanimous): > - The rule should not be abolished. There are cases where both the > impetus and the subject matter expertise for a patch all reside within > one company. In such cases, at least one core from another company > should still be engaged and provide a "procedural +2" - much like cores > proxy SME +1s when there's no core with deep expertise. > - If there is reasonable justification for bending the rules (e.g. typo > fixes as noted above, some piece of work clearly not related to the > company's interest, unwedging the gate, etc.) said justification should > be clearly documented in review commentary. > - The rule should not be documented (this email notwithstanding). This > would either encourage loopholing or turn into a huge detailed legal > tome that nobody will read. It would also *require* enforcement, which > is difficult and awkward. Overall, we should be able to trust cores to > act in good faith and in the appropriate spirit. > > efried > . > Keystone used to have the same policy outlined in this email (with much of the same nuance and exceptions). Without going into crazy details (as the contributor and core numbers went down), we opted to really lean on "Overall, we should be able to trust cores to act in good faith". We abolished the rule and the cores always ask for outside input when the familiarity lies outside of the team. We often also pull in cores more familiar with the code sometimes ending up with 3x+2s before we workflow the patch. Personally I don't like the "this is an unwritten rule and it shouldn't be documented"; if documenting and enforcement of the rule elicits worry of gaming the system or being a dense some not read, in my mind (and experience) the rule may not be worth having. I voice my opinion with the caveat that every team is different. If the rule works, and helps the team (Nova in this case) feel more confident in the management of code, the rule has a place to live on. What works for one team doesn't always work for another. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomi.juvonen at nokia.com Sun May 5 01:28:59 2019 From: tomi.juvonen at nokia.com (Juvonen, Tomi (Nokia - FI/Espoo)) Date: Sun, 5 May 2019 01:28:59 +0000 Subject: [fenix][ptg] summary Message-ID: Fenix Train PTG, What to do next, prioritizing https://etherpad.openstack.org/p/DEN2019-fenix-PTG Two non-Telco users would like to use Fenix to maintain their cloud. For this, Fenix need to prioritize work so we can provide production ready framework without Telco features first. Work is now prioritized in the Etherpad and missing things should also be added to storyboard next week. Fenix and ETSI NFV synch https://etherpad.openstack.org/p/DEN2019-fenix-ETSI-NFV-PTG There was also a discussion about supporting ETSI NFV defined constraints. Some instance and anti-affinity group constraints could be in Nova. Anyhow, for Fenix to be generic for any cloud, it would make sense to have more information kept within Fenix. This needs further investigation. Then there was a proposal of having direct subscription to Fenix from VNFM side instead of using subscription to AODH to have event alarm from Fenix notification to VNFM. One bad thing here was that VIM shouldn't have direct API call to external system. The current notification / AODH was also nice for any user to build simple manager just to receive the hint what new capability is coming to application (non-Telco) or to still have some simple solution to interact at the time of maintenance. Thanks, Tomi (tojuvone) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmann at ghanshyammann.com Sun May 5 07:10:35 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 05 May 2019 02:10:35 -0500 Subject: [qa][ptg][patrole] RBAC testing improvement ideas for Patrole Message-ID: <16a86d4834e.e46610fc23956.8020827235456111857@ghanshyammann.com> Patrole is emerging as a good tool for RBAC testing. AT&T already running it on their production cloud and we have got a good amount of interest/feedback from other operators. We had few discussions regarding the Patrole testing improvement during PTG among QA, Nova, Keystone team. I am writing the summary of those discussions below and would like to get the opinion from Felipe & Sergey also. 1. How to improve the Patrole testing time: Currently Patrole test perform the complete API operaion which takes time and make Patrole testing very long. Patrole is responsible to test the policies only so does not need to wait for API complete operation to be completed. John has a good idea to handle that via flag. If that flag is enabled (per service and disabled by default) then oslo.policy can return some different error code on success (other than 403). The API can return the response with that error code which can be treated as pass case in Patrole. Morgan raises a good point on making it per API call than global. We can do that as next step and let's start with the global flag per service as of now? - https://etherpad.openstack.org/p/ptg-train-xproj-nova-keystone Another thing we should improve in current Patrole jobs is to separate the jobs per service. Currently, all 5 services are installed and run in a single job. Running all on Patrole gate is good but the project side gate does not need to run any other service tests. For example, patrole-keystone which can install the only keystone and run only keystone tests. This way project can reuse the patrole jobs only and does not need to prepare a separate job. 2. How to run patrole tests with all negative, positive combination for all scope + defaults roles combinations: - Current jobs patrole-admin/member/reader are able to test the negative pattern. For example: patrole-member job tests the admin APIs in a negative way and make sure test is passed only if member role gets 403. - As we have scope_type support also we need to extend the jobs to run for all 9 combinations of 3 scopes (system, project, domain) and 3 roles(admin, member, reader). - option1: running 9 different jobs with each combination as we do have currently for admin, member, reader role. The issue with this approach is gate will take a lot of time to run these 9 jobs separately. - option2: Run all the 9 combinations in a single job with running the tests in the loop with different combination of scope_roles. This might require the current config option [role] to convert to list type and per service so that the user can configure what all default roles are available for corresponding service. This option can save a lot of time to avoid devstack installation time as compared to 9 different jobs option. -gmann From gmann at ghanshyammann.com Sun May 5 07:18:08 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 05 May 2019 02:18:08 -0500 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast Message-ID: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We tried to improve it via filtering the slow tests in the separate tempest-slow job but the situation has not been improved much. We talked about the Ideas to make it more stable and fast for projects especially when failure is not related to each project. We are planning to split the integrated-gate template (only tempest-full job as first step) per related services. Idea: - Run only dependent service tests on project gate. - Tempest gate will keep running all the services tests as the integrated gate at a centeralized place without any change in the current job. - Each project can run the below mentioned template. - All below template will be defined and maintained by QA team. I would like to know each 6 services which run integrated-gate jobs 1."Integrated-gate-networking" (job to run on neutron gate) Tests to run in this template: neutron APIs , nova APIs, keystone APIs ? All scenario currently running in tempest-full in the same way ( means non-slow and in serial) Improvement for neutron gate: exlcude the cinder API tests, glance API tests, swift API tests, 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) Improvement for cinder, glance gate: excluded the neutron APIs tests, Keystone APIs tests 3. "Integrated-gate-object-storage" (job to run on swift gate) Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) Improvement for swift gate: excluded the neutron APIs tests, - Keystone APIs tests, - Nova APIs tests. Note: swift does not run integrated-gate as of now. 4. "Integrated-gate-compute" (job to run on Nova gate) tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and All scenario currently running in tempest-full in same way ( means non-slow and in serial) Improvement for Nova gate: excluded the swift APIs tests(not running in current job but in future, it might), Keystone API tests. 5. "Integrated-gate-identity" (job to run on keystone gate) Tests to run is : all as all project use keystone, we might need to run all tests as it is running in integrated-gate. But does keystone is being unsed differently by all services? if no then, is it enough to run only single service tests say Nova or neutron ? 6. "Integrated-gate-placement" (job to run on placement gate) Tests to run in this template: Nova APIs tests, Neutron APIs tests + scenario tests + any new service depends on placement APIs Improvement for placement gate: excluded the glance APIs tests, cinder APIs tests, swift APIs tests, keystone APIs tests Thoughts on this approach? The important point is we must not lose the coverage of integrated testing per project. So I would like to get each project view if we are missing any dependency (proposed tests removal) in above proposed templates. - https://etherpad.openstack.org/p/qa-train-ptg -gmann From gmann at ghanshyammann.com Sun May 5 07:21:32 2019 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sun, 05 May 2019 02:21:32 -0500 Subject: [qa][form][ptg] QA Summary for Forum & PTG Message-ID: <16a86de8992.f13a438724004.7062826677271782113@ghanshyammann.com> Hello Everyone, We had a good discussion at QA forum and PTG. I am summarizing those and will start the separate thread for the few topics which needs more feedback. Summit: QA Forum sessions: 1. OpenStack QA - Project Update: Tuesday, April 30, 2:35pm-2:55pm We gave the updates on what we finished on Stein and draft plan for Train cycle. The good thing to note is we still have a lot of activity going on in QA. As overall QA projects, we did >3000 reviews and 750 commits. Video is not up still so I am copying the slide link below. Slides: https://docs.google.com/presentation/d/10zupeFZuOlxroAMl29qVJl78nD4_YWHkQxANNVlIjE0/edit?ts=5cc73ae8#slide=id.p1 2. OpenStack QA - Project Onboarding : Wednesday, May 1, 9:00am-9:40am We did host the QA onboarding sessions but there were only 3 attendees and no new contributor. I think it is hard to see any new contributor in summits now so I am thinking whether we should host the onboarding sessions from next time. Etherpad: https://etherpad.openstack.org/p/DEN-qa-onboarding 3. Users / Operators adoption of QA tools / plugins : Wednesday, May 1, 10:50am-11:30am3. As usual, we had more attendees in this session and useful feedback. There are few tooling is being shared by attendees: 1. Python hardware module for bare metal detailed hardware inspection & anomaly detection https://github.com/redhat-cip/hardware 2. Workload testing: https://opendev.org/x/tobiko/ Another good idea from Doug was plugin feature in openstack-health dashboard. That is something we discussed in PTG. For more details on this, refer the PTG " OpenStack-health improvement" section. Etherpad: https://etherpad.openstack.org/p/Den-forum-qa-ops-user-feedback QA PTG: 2nd - 3rd May: We were 3-4 attendee in the room always and others attended per topics. Good discussions and few good improvement ideas about gate stability and dashboard etc. 1. Topic: Stein Retrospective We collect good and need improvement things in this session. In term of good things, we completed the OpenStack gate migration fro Xenial to Bionic, lot of reviews and code. Doug from AT&T mentioned about to add tempest and patrole to gates and check in their production deployment process, "Thank you for all of the hard work from the QA team!!!" Slow reviews are a concern as we have a good number of the incoming request. This is something we should improve in Train. Action items: gmann: start the plan for backlogs especially for review and doc cleanup. masayukig: plan to have resource leakage check in gate. ds6901:will work with his team to clean up leaks and submit bugs 2. Topic: Keystone system-scope testing QA and Keystone team gathered together in this cross-project session about next steps on system scope testing. We talked on multiple points about how to cover all new roles for system scope and how to keep the backward compatibility testing for stable branches still testing the without system scope. We decided to move forward for system_admin as of now and fall back the system_admin to project scope if there is no system_scope testing flag is true on Tempest side (this will cover the stable branch testing unaffected). We agreed : - To move forward with system admin - https://review.opendev.org/#/c/604909/ - Add tempest job to test system scope - https://review.opendev.org/#/c/614484/ - Then add to tempest full - gmann - Then add testing for system reader - Investigate more advanced RBAC testing with Patrole - gmann Etherpad: https://etherpad.openstack.org/p/keystone-train-ptg-testing-system-scope-in-tempest 3. Topic: new whitebox plugin for tempest: This is a new idea from artom about testing things outside of Tempest's scope (currently mostly used to check instance XML for NFV use case tests). Currently, this tool does ssh into VM and fetch the xml for further verification etc. We agreed on point to avoid any duplicate test verification from the Tempest or nova functional tests This is good to tests from more extra verification by going inside VM like after migration data, CPU pinning etc. As next step artom to propose the QA spec with details and proposal of this plugin under QA program. 4. Topic: Document the QA process or TODO things for releases, stable branch cut: Idea is to start a centralized doc page for QA activities and process etc. we want to use the qa-specs repo to publish the content to doc.openstack.org/qa/. This can be not so easy and need few tweaks on doc jobs. I will get into the details and then discuss with infra team. This is a low priority for now. 5. Topic: Plugin sanity check Current tempest-plugins-sanity job is not stable and so it is n-v. We want to make it voting by only installing the active plugins. many plugins are failing either they are dead or not so active. We agreed on: - make faulty plugins as blacklist with bug/patch link and notify the same on ML every time we detect any failure - Publish the blakclist on plugins-registry doc. - After that make this job voting, make the process of fixing and removing the faulty plugin which unblocks the tempest gate with author self-approve. - Make sanity job running on plugins which are dependent on each other. For example, congress-tempest-plugin use neutron-tempest-plugin, mistral-tempest-plugin etc so all these plugins should have a sanity job which can install and list these plugins tests only not all the plugins. 6. Topic: Planning for Patrole Stable release: We had a good amount of discussions for Patrole improvements area to release it stable. Refer the below ML thread for details and further discussions on this topic: - http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005870.html 7. Topic: How to make tempest-full stable ( Don't fail integrated job when not related test will fail ) Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We discussed the few ideas to improve it. Refer the below ML thread for details and further discussions on this topic : http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005871.html 8. Topic: OpenStack-Health Improvement Doug from AT&T has few improvement ideas for health dashboard which has been discussed in PTG: - Test Grouping - Define groups - Assigned test to groups - filter by groups - Compare 2 runs = Look into push AQuA report to subunit2SQL as a tool Action Items: - Doug is going to write the spec for plugin approach. All other ideas can be done after we have the plugin approach ready. - filter - presentation 9. Topic: Stein Backlogs & Train priorities & Planning We collected the Train items in below mentioned etherpad with the assignee. Anyone would like to help on any of the item, ping me on IRC or reply here. Etherpad: https://etherpad.openstack.org/p/qa-train-priority 10. Topic: grenade zuulv3 jobs review/discussions We did not get the chance to review these. Let's continue it after PTG. Full Detail discussion: https://etherpad.openstack.org/p/qa-train-ptg -gmann From liuyulong.xa at gmail.com Sun May 5 09:37:56 2019 From: liuyulong.xa at gmail.com (LIU Yulong) Date: Sun, 5 May 2019 17:37:56 +0800 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast In-Reply-To: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> References: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> Message-ID: +1 On Sun, May 5, 2019 at 3:18 PM Ghanshyam Mann wrote: > Current integrated-gate jobs (tempest-full) is not so stable for various > bugs specially timeout. We tried > to improve it via filtering the slow tests in the separate tempest-slow > job but the situation has not been improved much. > > We talked about the Ideas to make it more stable and fast for projects > especially when failure is not > related to each project. We are planning to split the integrated-gate > template (only tempest-full job as > first step) per related services. > > Idea: > - Run only dependent service tests on project gate. > - Tempest gate will keep running all the services tests as the integrated > gate at a centeralized place without any change in the current job. > - Each project can run the below mentioned template. > - All below template will be defined and maintained by QA team. > > I would like to know each 6 services which run integrated-gate jobs > > 1."Integrated-gate-networking" (job to run on neutron gate) > Tests to run in this template: neutron APIs , nova APIs, keystone APIs ? > All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > Improvement for neutron gate: exlcude the cinder API tests, glance API > tests, swift API tests, > > 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova > APIs and All scenario currently running in tempest-full in the same way ( > means non-slow and in serial) > Improvement for cinder, glance gate: excluded the neutron APIs tests, > Keystone APIs tests > > 3. "Integrated-gate-object-storage" (job to run on swift gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and > All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > Improvement for swift gate: excluded the neutron APIs tests, - Keystone > APIs tests, - Nova APIs tests. > Note: swift does not run integrated-gate as of now. > > 4. "Integrated-gate-compute" (job to run on Nova gate) > tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and > All scenario currently running in tempest-full in same way ( means non-slow > and in serial) > Improvement for Nova gate: excluded the swift APIs tests(not running in > current job but in future, it might), Keystone API tests. > > 5. "Integrated-gate-identity" (job to run on keystone gate) > Tests to run is : all as all project use keystone, we might need to run > all tests as it is running in integrated-gate. > But does keystone is being unsed differently by all services? if no then, > is it enough to run only single service tests say Nova or neutron ? > > 6. "Integrated-gate-placement" (job to run on placement gate) > Tests to run in this template: Nova APIs tests, Neutron APIs tests + > scenario tests + any new service depends on placement APIs > Improvement for placement gate: excluded the glance APIs tests, cinder > APIs tests, swift APIs tests, keystone APIs tests > > Thoughts on this approach? > > The important point is we must not lose the coverage of integrated testing > per project. So I would like to > get each project view if we are missing any dependency (proposed tests > removal) in above proposed templates. > > - https://etherpad.openstack.org/p/qa-train-ptg > > -gmann > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From colleen at gazlene.net Sun May 5 15:58:22 2019 From: colleen at gazlene.net (Colleen Murphy) Date: Sun, 05 May 2019 11:58:22 -0400 Subject: [dev][keystone][ptg] Keystone team action items Message-ID: Hi everyone, I will write an in-depth summary of the Forum and PTG some time in the coming week, but I wanted to quickly capture all the action items that came out of the last six days so that we don't lose too much focus: Colleen * move "Expand endpoint filters to Service Providers" spec[1] to attic * review "Policy Goals"[2] and "Policy Security Roadmap"[3] specs with Lance, refresh and possibly combine them * move "Unified model for assignments, OAuth, and trusts" spec[4] from ongoing to backlog, and circle up with Adam about refreshing it * update app creds spec[5] to defer access_rules_config * review app cred documentation with regard to proactive rotation * follow up with nova/other service teams on need for microversion support in access rules * circle up with Guang on fixing autoprovisioning for tokenless auth * keep up to date with IEEE/NIST efforts on standardizing federation * investigate undoing the foreign key constraint that breaks the pluggable resource driver * propose governance change to add caching as a base service * clean out deprecated cruft from keystonemiddleware * write up Outreachy/other internship application tasks [1] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/service-providers-filters.html [2] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/policy-goals.html [3] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/policy-security-roadmap.html [4] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/unified-delegation.html [5] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/train/capabilities-app-creds.html Lance * write up plan for tempest testing of system scope * break up unified limits testing plan into separate items, one for CRUD in keystone and one for quota and limit validation in oslo.limit[6] * write up spec for assigning roles on root domain * (with Morgan) check for and add interface in oslo.policy to see if policy has been overridden [6] https://trello.com/c/kbKvhYBz/20-test-unified-limits-in-tempest Kristi * finish mutable config patch * propose "model-timestamps" spec for Train[7] * move "Add Multi-Version Support to Federation Mappings" spec[8] to attic * review and possibly complete "Devstack Plugin for Keystone" spec[9] * look into "RFE: Improved OpenID Connect Support" spec[10] * update refreshable app creds spec[11] to make federated users expire rather then app creds * deprecate federated_domain_name [7] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/model-timestamps.html [8] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/versioned-mappings.html [9] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/devstack-plugin.html [10] https://bugs.launchpad.net/keystone/+bug/1815971 [11] https://review.opendev.org/604201 Vishakha * investigate effort needed for Alembic migrations spec[12] (with help from Morgan) * merge "RFE: Retrofit keystone-manage db_* commands to work with Alembic"[13] into "Use Alembic for database migrations" spec * remove deprecated [signing] config * remove deprecated [DEFAULT]/admin_endpoint config * remove deprecated [token]/infer_roles config [12] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/alembic.html [13] https://bugs.launchpad.net/keystone/+bug/1816158 Morgan * review "Materialize Project Hierarchy" spec[14] and make sure it reflects the current state of the world, keep it in the backlog * move "Functional Testing" spec[15] to attic * move "Object Dependency Lifecycle" spec[16] to complete * move "Add Endpoint Filter Enforcement to Keystonemiddleware" spec[17] to attic * move "Request Helpers" spec[18] to attic * create PoC of external IdP proxy component * (with Lance) check for and add interface in oslo.policy to see if policy has been overridden * investigate removing [eventlet_server] config section * remove remaining PasteDeploy things * remove PKI(Z) cruft from keystonemiddleware * refactor keystonemiddleware to have functional components instead of needing keystone to instantiate keystonemiddleware objects for auth [14] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/materialize-project-hierarchy.html [15] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/functional-testing.html [16] http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/object-dependency-lifecycle.html [17] http://specs.openstack.org/openstack/keystone-specs/specs/keystonemiddleware/backlog/endpoint-enforcement-middleware.html [18] http://specs.openstack.org/openstack/keystone-specs/specs/keystonemiddleware/backlog/request-helpers.html Gage * investigate with operators about specific use case behind "RFE: Whitelisting (opt-in) users/projects/domains for PCI compliance"[19] request * follow up on "RFE: Token returns Project's tag properties"[20] * remove use of keystoneclient from keystonemiddleware [19] https://bugs.launchpad.net/keystone/+bug/1637146 [20] https://bugs.launchpad.net/keystone/+bug/1807697 Rodrigo * Propose finishing "RFE: Project Tree Deletion/Disabling"[21] as an Outreachy project [21] https://bugs.launchpad.net/keystone/+bug/1816105 Adam * write up super-spec on explicit project IDs plus predictable IDs Thanks everyone for a productive week and for all your hard work! Colleen From jeremyfreudberg at gmail.com Sun May 5 20:24:14 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Sun, 5 May 2019 16:24:14 -0400 Subject: [ironic][neutron][ops] Ironic multi-tenant networking, VMs In-Reply-To: References: Message-ID: Sukhdev- yes it helps a ton. Thank you! If anyone reading the list has a citable example of this, public on the web, feel free to chime in. On Sat, May 4, 2019 at 3:43 PM Sukhdev Kapur wrote: > > Jeremy, > > If you want to use VxLAN networks for the bremetal hosts, you would use ML2 VLAN networks, as Julia described, between the host and switch port. That VLAN will then terminate into a VTAP on the switch port which will carry appropriate tags in the VxLAN overlay. > > Hope this helps > -Sukhdev > > > On Thu, May 2, 2019 at 9:28 PM Jeremy Freudberg wrote: >> >> Thanks Julia; this is helpful. >> >> Thanks also for reading my mind a bit, as I am thinking of the VXLAN >> case... I can't help but notice that in the Ironic CI jobs, multi >> tenant networking being used seems to entail VLANs as the tenant >> network type (instead of VXLAN). Is it just coincidence / how the gate >> just is, or is it hinting something about how VXLAN and bare metal get >> along? >> >> On Wed, May 1, 2019 at 6:38 PM Julia Kreger wrote: >> > >> > Greetings Jeremy, >> > >> > Best Practice wise, I'm not directly aware of any. It is largely going >> > to depend upon your Neutron ML2 drivers and network fabric. >> > >> > In essence, you'll need an ML2 driver which supports the vnic type of >> > "baremetal", which is able to able to orchestrate the switch port port >> > binding configuration in your network fabric. If your using vlan >> > networks, in essence you'll end up with a neutron physical network >> > which is also a trunk port to the network fabric, and the ML2 driver >> > would then appropriately tag the port(s) for the baremetal node to the >> > networks required. In the CI gate, we do this in the "multitenant" >> > jobs where networking-generic-switch modifies the OVS port >> > configurations directly. >> > >> > If specifically vxlan is what your looking to use between VMs and >> > baremetal nodes, I'm unsure of how you would actually configure that, >> > but in essence the VXLANs would still need to be terminated on the >> > switch port via the ML2 driver. >> > >> > In term of Ironic's documentation, If you haven't already seen it, you >> > might want to check out ironic's multi-tenancy documentation[1]. >> > >> > -Julia >> > >> > [1]: https://docs.openstack.org/ironic/latest/admin/multitenancy.html >> > >> > On Wed, May 1, 2019 at 10:53 AM Jeremy Freudberg >> > wrote: >> > > >> > > Hi all, >> > > >> > > I'm wondering if anyone has any best practices for Ironic bare metal >> > > nodes and regular VMs living on the same network. I'm sure if involves >> > > Ironic's `neutron` multi-tenant network driver, but I'm a bit hazy on >> > > the rest of the details (still very much in the early stages of >> > > exploring Ironic). Surely it's possible, but I haven't seen mention of >> > > this anywhere (except the very old spec from 2015 about introducing >> > > ML2 support into Ironic) nor is there a gate job resembling this >> > > specific use. >> > > >> > > Ideas? >> > > >> > > Thanks, >> > > Jeremy >> > > >> From jeremyfreudberg at gmail.com Sun May 5 20:36:49 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Sun, 5 May 2019 16:36:49 -0400 Subject: [sahara][all] Sahara virtual PTG reminder (approaching quickly!) Message-ID: The Sahara virtual PTG will take place Monday, May 6, at 15:00 UTC. All are welcome. Etherpad link: https://etherpad.openstack.org/p/sahara-train-ptg Bluejeans link: https://bluejeans.com/6304900378 From doka.ua at gmx.com Sun May 5 21:34:01 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 6 May 2019 00:34:01 +0300 Subject: [octavia] Amphora agent returned unexpected result code 500 Message-ID: <5798b929-737e-fd29-a2a5-7c1246a632bb@gmx.com> Dear colleagues, trying to launch Amphorae, getting the following error in logs: Amphora agent returned unexpected result code 500 with response {'message': 'Error plugging VIP', 'details': 'SIOCADDRT: Network is unreachable\nFailed to bring up eth1.\n'} While details below, questions are here: - whether it's enough to assign roles as explained below to special project for Octavia? - whether it can be issue with image, created by diskimage_create.sh? - any recommendation on where to search for the problem. Thank you. My environment is: - Openstack Rocky - Octavia 4.0 - amphora instance runs in special project "octavia", where users octavia, nova and neutron have admin role - amphora image prepared using original git repo process and elements without modification: * git clone * cd octavia * diskimage-create/diskimage-create.sh * openstack image create [ ... ] --tag amphora After created, amphora instance successfully connects to management network and can be accessed by controller: 2019-05-05 20:46:06.851 18234 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: request /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:486 2019-05-05 20:46:06.852 18234 DEBUG octavia.controller.worker.tasks.amphora_driver_tasks [-] Successfuly connected to amphora 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5: {'ipvsadm_version': '1:1.28-3', 'api_version': '0.5', 'haproxy_version': '1.6.3-1ubuntu0.2', 'hostname': 'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', 'keepalived_version': '1:1.2.24-1ubuntu0.16.04.1'} execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py:372 [ ... ] 2019-05-05 20:46:06.990 18234 DEBUG octavia.controller.worker.tasks.network_tasks [-] Plumbing VIP for amphora id: 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/network_tasks.py:382 2019-05-05 20:46:07.003 18234 DEBUG octavia.network.drivers.neutron.base [-] Neutron extension security-group found enabled _check_extension_enabled /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 2019-05-05 20:46:07.013 18234 DEBUG octavia.network.drivers.neutron.base [-] Neutron extension dns-integration found enabled _check_extension_enabled /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 2019-05-05 20:46:07.025 18234 DEBUG octavia.network.drivers.neutron.base [-] Neutron extension qos found enabled _check_extension_enabled /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 2019-05-05 20:46:07.044 18234 DEBUG octavia.network.drivers.neutron.base [-] Neutron extension allowed-address-pairs found enabled _check_extension_enabled /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 2019-05-05 20:46:08.406 18234 DEBUG octavia.network.drivers.neutron.allowed_address_pairs [-] Created vip port: b0398cc8-6d52-4f12-9f1f-1141b0f10751 for amphora: 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 _plug_amphora_vip /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py:97 [ ... ] 2019-05-05 20:46:15.405 18234 DEBUG octavia.network.drivers.neutron.allowed_address_pairs [-] Retrieving network details for amphora 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 _get_amp_net_configs /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py:596 [ ... ] 2019-05-05 20:46:15.837 18234 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Post-VIP-Plugging with vrrp_ip 10.0.2.13 vrrp_port b0398cc8-6d52-4f12-9f1f-1141b0f10751 post_vip_plug /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:233 2019-05-05 20:46:15.838 18234 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url plug/vip/10.0.2.24 request /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:462 2019-05-05 20:46:15.838 18234 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url https://172.16.252.35:9443/0.5/plug/vip/10.0.2.24 request /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:465 2019-05-05 20:46:16.089 18234 DEBUG octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to amphora. Response: request /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:486 2019-05-05 20:46:16.090 18234 ERROR octavia.amphorae.drivers.haproxy.exceptions [-] Amphora agent returned unexpected result code 500 with response {'message': 'Error plugging VIP', 'details': 'SIOCADDRT: Network is unreachable\nFailed to bring up eth1.\n'} During the process, NEUTRON logs contains the following records that indicate the following (note "status=DOWN" in neutron-dhcp-agent; later immediately before to be deleted, it will shed 'ACTIVE'): May  5 20:46:13 ardbeg neutron-dhcp-agent: 2019-05-05 20:46:13.857 1804 INFO neutron.agent.dhcp.agent [req-07833602-9579-403b-a264-76fd3ee408ee a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - - -] Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[{u'ip_address': u'10.0.2.24', u'mac_address': u'72:d0:1c:4c:94:91'}], binding:host_id=ardbeg, binding:profile=, binding:vif_details=datapath_type=system, ovs_hybrid_plug=False, port_filter=True, binding:vif_type=ovs, binding:vnic_type=normal, created_at=2019-05-05T20:46:07Z, description=, device_id=f1bce6e9-be5b-464b-8f64-686f36e9de1f, device_owner=compute:nova, dns_assignment=[{u'hostname': u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', u'ip_address': u'10.0.2.13', u'fqdn': u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5.loqal.'}], dns_domain=, dns_name=amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'24b10886-3d53-4aee-bdc6-f165b242ae4f', u'ip_address': u'10.0.2.13'}], id=b0398cc8-6d52-4f12-9f1f-1141b0f10751, mac_address=72:d0:1c:4c:94:91, name=octavia-lb-vrrp-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, network_id=b24d2830-eec6-4abd-82f2-ac71c8ecbf40, port_security_enabled=True, project_id=41a02a69918849509f4102b04f8a7de9, qos_policy_id=None, revision_number=5, security_groups=[u'6df53a15-6afc-4c99-b464-03de4f546b4f'], status=DOWN, tags=[], tenant_id=41a02a69918849509f4102b04f8a7de9, updated_at=2019-05-05T20:46:13Z May  5 20:46:14 ardbeg neutron-openvswitch-agent: 2019-05-05 20:46:14.185 31542 INFO neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent [req-a4425cdb-afc1-4f6a-9ef9-c8706e3285d6 - - - - -] Port b0398cc8-6d52-4f12-9f1f-1141b0f10751 updated. Details: {'profile': {}, 'network_qos_policy_id': None, 'qos_policy_id': None, 'allowed_address_pairs': [{'ip_address': AuthenticIPNetwork('10.0.2.24'), 'mac_address': EUI('72:d0:1c:4c:94:91')}], 'admin_state_up': True, 'network_id': 'b24d2830-eec6-4abd-82f2-ac71c8ecbf40', 'segmentation_id': 437, 'fixed_ips': [{'subnet_id': '24b10886-3d53-4aee-bdc6-f165b242ae4f', 'ip_address': '10.0.2.13'}], 'device_owner': u'compute:nova', 'physical_network': None, 'mac_address': '72:d0:1c:4c:94:91', 'device': u'b0398cc8-6d52-4f12-9f1f-1141b0f10751', 'port_security_enabled': True, 'port_id': 'b0398cc8-6d52-4f12-9f1f-1141b0f10751', 'network_type': u'vxlan', 'security_groups': [u'6df53a15-6afc-4c99-b464-03de4f546b4f']} May  5 20:46:14 ardbeg neutron-openvswitch-agent: 2019-05-05 20:46:14.197 31542 INFO neutron.agent.securitygroups_rpc [req-a4425cdb-afc1-4f6a-9ef9-c8706e3285d6 - - - - -] Preparing filters for devices set([u'b0398cc8-6d52-4f12-9f1f-1141b0f10751']) Note Nova returns response 200/completed: May  5 20:46:14 controller-l neutron-server: 2019-05-05 20:46:14.326 20981 INFO neutron.notifiers.nova [-] Nova event response: {u'status': u'completed', u'tag': u'b0398cc8-6d52-4f12-9f1f-1141b0f10751', u'name': u'network-changed', u'server_uuid': u'f1bce6e9-be5b-464b-8f64-686f36e9de1f', u'code': 200} and "openstack server show" shows both NICs are attached to the amphorae: $ openstack server show f1bce6e9-be5b-464b-8f64-686f36e9de1f +-------------------------------------+------------------------------------------------------------+ | Field                               | Value                                                      | +-------------------------------------+------------------------------------------------------------+ [ ... ] | addresses                           | octavia-net=172.16.252.35; u1000-p1000-xbone=10.0.2.13     | +-------------------------------------+------------------------------------------------------------+ Later Octavia worker reports the following: 2019-05-05 20:46:16.124 18234 DEBUG octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-octavia-amp-post-vip-plug' (f105ced1-72c6-4116-b582-599a21cdee36) transitioned into state 'REVERTING' from state 'FAILURE' _task_receiver /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 2019-05-05 20:46:16.127 18234 WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-octavia-amp-post-vip-plug' (f105ced1-72c6-4116-b582-599a21cdee36) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None' 2019-05-05 20:46:16.141 18234 DEBUG octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-reload-amp-after-plug-vip' (c4d6222e-2508-4a9c-9514-e7f9bcf84e31) transitioned into state 'REVERTING' from state 'SUCCESS' _task_receiver /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 2019-05-05 20:46:16.142 18234 WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-reload-amp-after-plug-vip' (c4d6222e-2508-4a9c-9514-e7f9bcf84e31) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None' 2019-05-05 20:46:16.146 18234 DEBUG octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-ocatvia-amp-update-vip-data' (2e1d1a04-282d-43b7-8c4f-fe31e75804ea) transitioned into state 'REVERTING' from state 'SUCCESS' _task_receiver /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 2019-05-05 20:46:16.148 18234 WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-ocatvia-amp-update-vip-data' (2e1d1a04-282d-43b7-8c4f-fe31e75804ea) transitioned into state 'REVERTED' from state 'REVERTING' with result 'None' 2019-05-05 20:46:16.173 18234 DEBUG octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-plug-net-subflow-octavia-amp-plug-vip' (c63a5bed-f531-4ed3-83d2-bce72e835932) transitioned into state 'REVERTING' from state 'SUCCESS' _task_receiver /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 2019-05-05 20:46:16.174 18234 WARNING octavia.controller.worker.tasks.network_tasks [-] Unable to plug VIP for amphora id 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 load balancer id e01c6ff5-179a-4ed5-ae5d-1d00d6c584b8 and Neutron then deletes port but NOTE that immediately before deletion port reported by neutron-dhcp-agent as ACTIVE: May  5 20:46:17 ardbeg neutron-dhcp-agent: 2019-05-05 20:46:17.080 1804 INFO neutron.agent.dhcp.agent [req-835e5b91-28e5-44b9-a463-d04a0323294f a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - - -] Trigger reload_allocations for port admin_state_up=True, allowed_address_pairs=[], binding:host_id=ardbeg, binding:profile=, binding:vif_details=datapath_type=system, ovs_hybrid_plug=False, port_filter=True, binding:vif_type=ovs, binding:vnic_type=normal, created_at=2019-05-05T20:46:07Z, description=, device_id=f1bce6e9-be5b-464b-8f64-686f36e9de1f, device_owner=compute:nova, dns_assignment=[{u'hostname': u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', u'ip_address': u'10.0.2.13', u'fqdn': u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5.loqal.'}], dns_domain=, dns_name=amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': u'24b10886-3d53-4aee-bdc6-f165b242ae4f', u'ip_address': u'10.0.2.13'}], id=b0398cc8-6d52-4f12-9f1f-1141b0f10751, mac_address=72:d0:1c:4c:94:91, name=octavia-lb-vrrp-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, network_id=b24d2830-eec6-4abd-82f2-ac71c8ecbf40, port_security_enabled=True, project_id=41a02a69918849509f4102b04f8a7de9, qos_policy_id=None, revision_number=8, security_groups=[u'ba20352e-95b9-4c97-a688-59d44e3aa8cf'], status=ACTIVE, tags=[], tenant_id=41a02a69918849509f4102b04f8a7de9, updated_at=2019-05-05T20:46:16Z May  5 20:46:17 controller-l neutron-server: 2019-05-05 20:46:17.086 20981 INFO neutron.wsgi [req-835e5b91-28e5-44b9-a463-d04a0323294f a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - default default] 10.0.10.31 "PUT /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: 200  len: 1395 time: 0.6318841 May  5 20:46:17 controller-l neutron-server: 2019-05-05 20:46:17.153 20981 INFO neutron.wsgi [req-37ee0da3-8dcc-4fb8-9cd3-91c5a8dcedef a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - default default] 10.0.10.31 "GET /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: 200  len: 1395 time: 0.0616651 May  5 20:46:18 controller-l neutron-server: 2019-05-05 20:46:18.179 20981 INFO neutron.wsgi [req-8896542e-5dcb-4e6d-9379-04cd88c4035b a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - default default] 10.0.10.31 "DELETE /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: 204  len: 149 time: 1.0199890 Thank you. -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Sun May 5 23:54:18 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sun, 5 May 2019 17:54:18 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Consumer Types Message-ID: We had a brief conversation in the placement room yesterday (Saturday May 5th) to confirm we were all on the same page with regard to consumer types. These provide a way to say that a set of allocations "is an instance" or "is a migration" and will help with quota accounting. We decided that since no one has stepped forward with a more complicated scenario, at this time, we will go with the simplest implementation, for now: * add a consumer types table that has a key and string (length to be determined, values controlled by clients) that represents a "type". For example (1, 'instance') * add a column on consumer table that takes one of those keys * create a new row in the types table only when a new type is created, don't worry about expiring them * provide an online migration to default existing consumers to 'instance' and treat unset types as 'instance' [1]. This probably needs some confirmation from mel and others that it is suitable. If not, please provide an alternative suggestion. * In a new microversion: allow queries to /usages to use a consumer type parameter to limit results to particular types and add 'consumer_type' key will be added to the body of an 'allocations' in both PUT and POST. * We did not discuss in the room, but the email thread [2] did: We may need to consider grouping /usages results by type but we could probably get by without changing that (and do multiple requests, sometimes). Surya, thank her very much, has volunteered to work on this and has started a spec at [3]. We have decided, again due to lack of expressed demand, to do any work (at this time) related to resource provider partitioning [4]. There's a pretty good idea on how to do this, but enough other stuff going on there's not time. Because we decided in that thread that any one resource provider can only be in one partition, there is also a very easy workaround: Run another placement server. It takes only a few minutes to set one up [5] This means that all of the client services of a single placement service need to coordinate on what consumer types they are using. (This was already true, but stated here for emphasis.) [1] I'm tempted to test how long a million or so rows of consumers would take to update. If it is short enough we may wish to break with the nova tradition of not doing data migrations in schema migrations (placement-manage db sync). But we didn't get a chance to discuss that in the room. [2] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/thread.html#4720 [3] https://review.opendev.org/#/c/654799/ [4] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004721.html [5] https://docs.openstack.org/placement/latest/install/from-pypi.html -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From cdent+os at anticdent.org Mon May 6 00:21:09 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sun, 5 May 2019 18:21:09 -0600 (MDT) Subject: [placement][nova][ptg] Summary: Nested Magic With Placement In-Reply-To: References: Message-ID: On Sat, 4 May 2019, Chris Dent wrote: > On Fri, 3 May 2019, Chris Dent wrote: > >> * This (Friday) afternoon at the PTG I'll be creating rfe stories >> associated with these changes. If you'd like to help with that, find >> me in the placement room (109). We'll work out whether those >> stories needs specs in the normally processing of the stories. >> We'll also need to find owners for many of them. > > I decided to capture all of this in one story: > > https://storyboard.openstack.org/#!/story/2005575 > > which will likely need to be broken into several stories, or at > least several detailed tasks. I have added some images to the story. They are from flipchart drawings made during yesterday's discussions and reflect some syntax and semantics decisions we made. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From cdent+os at anticdent.org Mon May 6 00:57:51 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sun, 5 May 2019 18:57:51 -0600 (MDT) Subject: [placement][ironic][blazar] Summary: Placment + Ironic and Blazar Message-ID: There were a few different discussions about Ironic using placement in various ways. On line 117 (for now) of the placement PTG etherpad [1] there are some notes about things that Ironic and Blazar could do for reservations. These are not expected to require any changes in placement. Dmitry and Tetsuro may have more to say about this. There was also a separate discussion about the options for using Placement do granular/detailed expression of available resources but full/chunky consumption of resources in a context where Ironic is running without Nova. That is: * record inventory for baremental nodes that say the inventory of node1 is CPU:24,DISK_GB:1048576,MEMORY_MB=1073741824 (and whatever else). * query something smaller (eg CPU:8,DISK_GB:524288,MEMORY_MB:536870912) in a GET /allocation_candidates * include node1 in the results along with others, let the client side sort using provider summaries * send back some mode of allocation that consumes the entire inventory of node1 There were a few different ideas on how to do that last step. One idea would have required different resource providers have an attribute that caused a change in behavior when allocated to. This was dismissed as "too much business logic in the guts". Another option was a flag on PUT /allocations that says "consume everything, despite what I've said". However, the option that was most favored was a new endpoint (name to be determined later if we ever do this) that is for the purpose of "fullly consuming the named resource provider". Clearly this is something that fairly contrary to how Nova imagines baremetal instances, but makes pretty good sense outside of the context where people want to be able to use placement to simultaneously get a flexible view of their baremetal resources and also track them accurately. There are no immediate plans to do this, but there are plans for Dmitry to continue investigating the options and seeing what can or cannot work. Having the feature described above would make things cleaner. I have made a placeholder (and low priority) story [2] that will link back to this email. [1] https://etherpad.openstack.org/p/placement-ptg-train [2] https://storyboard.openstack.org/#!/story/2005575 -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From eumel at arcor.de Mon May 6 00:58:30 2019 From: eumel at arcor.de (Frank Kloeker) Date: Mon, 06 May 2019 02:58:30 +0200 Subject: [I18n] Translation plan Train Message-ID: <64b2c98efa6931fbf9d3a5e288a3cf79@arcor.de> Hello Stackers, hopefully you enjoyed the time during the Open Infra Summit and the PTG in Denver - onsite or remotely. Maybe you inspired also the spirit from something new which will start from now on. As usually we at I18n after release we merge translations back from stable branch to master and starting with a new translation plan [1]. Without a simple copy of the last one I investigated project status under the help of commits and the OpenStack Health Tracker. That's very helpful to see which projects are active and which one have a break, so we can adjust translation priority. At the end the translation plan will be a little bit shorter and we have enough space to onboard new stuff. Additional project docs are not decided yet. But we have the new projects outside OpenStack like Airship, StarlingX and Zuul and if you think on the next Summit in Shanghai and the target downstream users, a translated version of user interfaces or documentation might be useful. If you have questions or remarks, let me know, or Ian, or the list :) Frank [1] https://translate.openstack.org/version-group/view/Train-dashboard-translation/projects From cdent+os at anticdent.org Mon May 6 01:23:31 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Sun, 5 May 2019 19:23:31 -0600 (MDT) Subject: [placement][ptg] Open Questions Message-ID: A few questions we failed to resolve during the PTG that we should work out over the next couple of weeks. * There are two specs in progress related to more flexible ways to filter traits: * any trait in allocation candidates https://review.opendev.org/#/c/649992/ * support mixing required traits with any traits https://review.opendev.org/#/c/649368/ Do we have pending non-placement features which depend on the above being completed? I got the impression during the nova-placement xproj session that maybe they were, but it's not clear. Anyone willing to state one way or another? * We had several RFE stories already in progress, and have added a few more during the PTG. We have not done much in the way of prioritizing these. We certainly can't do them all. Here's a link to the current RFE stories in the placement group (this includes placement, osc-placement and os-*). https://storyboard.openstack.org/#!/worklist/594 I've made a simple list of those on an etherpad, please register you +1 or -1 (or nothing) on each of those. Keep in mind that there are several features in "Update nested provider support to address train requirements" and that we've already committed to them. Please let me know what I've forgotten. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From sergey at vilgelm.info Mon May 6 01:36:00 2019 From: sergey at vilgelm.info (Sergey Vilgelm) Date: Sun, 5 May 2019 20:36:00 -0500 Subject: [qa][ptg][patrole] RBAC testing improvement ideas for Patrole In-Reply-To: <16a86d4834e.e46610fc23956.8020827235456111857@ghanshyammann.com> References: <16a86d4834e.e46610fc23956.8020827235456111857@ghanshyammann.com> Message-ID: <7a5dcff9-ca99-496e-a022-f06830fd03a5@Spark> Hi, Gmann, thank you so much. 1. I’m not sure that I understood the #1. Do you mean that oslo.policy will raise a special exceptions for successful and unsuccessful verification if the flag is set? So a service will see the exception and just return it. And Patorle can recognize those exceptions? I’m totally agree with using one job for one services, It can give us a possibility to temporary disable some services and allow patches for other services to be tested and merged. 2. +1 for the option 2. We can decrease the number of jobs and have just one job for one services, but we need to think about how to separate the logs. IMO we need to extend the `action` decorator to run a test 9 times (depends on the configuration) and memorize all results for all combinations and use something like `if not all(results): raise PatroleException()` -- Sergey Vilgelm https://www.vilgelm.info On May 5, 2019, 2:15 AM -0500, Ghanshyam Mann , wrote: > Patrole is emerging as a good tool for RBAC testing. AT&T already running it on their production cloud and > we have got a good amount of interest/feedback from other operators. > > We had few discussions regarding the Patrole testing improvement during PTG among QA, Nova, Keystone team. > I am writing the summary of those discussions below and would like to get the opinion from Felipe & Sergey also. > > 1. How to improve the Patrole testing time: > Currently Patrole test perform the complete API operaion which takes time and make Patrole testing > very long. Patrole is responsible to test the policies only so does not need to wait for API complete operation > to be completed. > John has a good idea to handle that via flag. If that flag is enabled (per service and disabled by default) then > oslo.policy can return some different error code on success (other than 403). The API can return the response > with that error code which can be treated as pass case in Patrole. > Morgan raises a good point on making it per API call than global. We can do that as next step and let's > start with the global flag per service as of now? > - https://etherpad.openstack.org/p/ptg-train-xproj-nova-keystone > > Another thing we should improve in current Patrole jobs is to separate the jobs per service. Currently, all 5 services > are installed and run in a single job. Running all on Patrole gate is good but the project side gate does not need to run > any other service tests. For example, patrole-keystone which can install the only keystone and run only > keystone tests. This way project can reuse the patrole jobs only and does not need to prepare a separate job. > > 2. How to run patrole tests with all negative, positive combination for all scope + defaults roles combinations: > - Current jobs patrole-admin/member/reader are able to test the negative pattern. For example: > patrole-member job tests the admin APIs in a negative way and make sure test is passed only if member > role gets 403. > - As we have scope_type support also we need to extend the jobs to run for all 9 combinations of 3 scopes > (system, project, domain) and 3 roles(admin, member, reader). > - option1: running 9 different jobs with each combination as we do have currently > for admin, member, reader role. The issue with this approach is gate will take a lot of time to > run these 9 jobs separately. > - option2: Run all the 9 combinations in a single job with running the tests in the loop with different > combination of scope_roles. This might require the current config option [role] to convert to list type > and per service so that the user can configure what all default roles are available for corresponding service. > This option can save a lot of time to avoid devstack installation time as compared to 9 different jobs option. > > -gmann > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aaronzhu1121 at gmail.com Mon May 6 02:30:05 2019 From: aaronzhu1121 at gmail.com (Rong Zhu) Date: Mon, 6 May 2019 10:30:05 +0800 Subject: [stackalytics] Reported numbers seem inaccurate In-Reply-To: References: Message-ID: Hi Sergey, Do we have any process about my colleague's data loss problem? Sergey Nikitin 于2019年4月29日 周一19:57写道: > Thank you for information! I will take a look > > On Mon, Apr 29, 2019 at 3:47 PM Rong Zhu wrote: > >> Hi there, >> >> Recently we found we lost a person's data from our company at the >> stackalytics website. >> You can check the merged patch from [0], but there no date from >> the stackalytics website. >> >> stackalytics info as below: >> Company: ZTE Corporation >> Launchpad: 578043796-b >> Gerrit: gengchc2 >> >> Look forward to hearing from you! >> > Best Regards, Rong Zhu > >> -- Thanks, Rong Zhu -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Mon May 6 07:26:51 2019 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Mon, 6 May 2019 02:26:51 -0500 Subject: [neutron] Unable to configure multiple external networks Message-ID: Hello All, I am trying to install Openstack Stein on a single node, with multiple external networks (both networks are also shared). However, i keep getting the following error in the logs, and the router interfaces show as down. 2019-05-06 02:19:45.046 52175 ERROR neutron.agent.l3.agent 2019-05-06 02:19:45.048 52175 INFO neutron.agent.l3.agent [-] Starting router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3, priority 2 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a2ec6c99-944e-408a-945a-dffbe09f65ce: Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Traceback (most recent call last): 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 701, in _process_routers_if_compatible 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     self._process_router_if_compatible(router) 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 548, in _process_router_if_compatible 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     target_ex_net_id = self._fetch_external_net_id() 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _fetch_external_net_id 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     raise Exception(msg) 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent 2019-05-06 02:19:46.252 52175 WARNING neutron.agent.l3.agent [-] Hit retry limit with router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3 2019-05-06 02:19:46.253 52175 WARNING neutron.agent.l3.agent [-] Info for router a2ec6c99-944e-408a-945a-dffbe09f65ce was not found. Performing router cleanup I have set these parameters to empty, as mentioned in the docs. /etc/neutron/l3_agent.ini gateway_external_network_id = external_network_bridge = interface_driver = openvswitch I tried linuxbridge-agent too,but i could not get rid of the above error.  openstack port list --router router1 +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ | ID                                   | Name | MAC Address       | Fixed IP Addresses                                                         | Status | +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ | 1bcaad17-17ed-4383-9206-34417f8fd2df |      | fa:16:3e:c1:b1:1f | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | DOWN   | | f49d976f-b733-4360-9d1f-cdd35ecf54e6 |      | fa:16:3e:54:82:4b | ip_address='10.0.10.11', subnet_id='7cc01a33-f078-494d-9b0b-e988f5b4915d'  | DOWN   | +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+————+ However it does work when i have just one external network  openstack port list --router router1 +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ | ID                                   | Name | MAC Address       | Fixed IP Addresses                                                             | Status | +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ | cdb06cf7-7492-4275-bd93-88a46b9769a8 |      | fa:16:3e:7c:ea:55 | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080'     | ACTIVE | | fc9b06d7-d377-451b-9af5-07e1fab072dc |      | fa:16:3e:d0:6d:7c | ip_address='140.163.188.149', subnet_id='4a2bf30a-e7f8-44c1-8b08-4de01b2b1296' | ACTIVE | +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ May i please know, how to get the above working. I have seen multiple articles online that mention that this should be working, however i am unable to get this to work. It is really important for us to have to have 2 external networks in the environment, and be able to route to both of them if possible. Thank you, Lohit -------------- next part -------------- An HTML attachment was scrubbed... URL: From valleru at cbio.mskcc.org Mon May 6 07:58:15 2019 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Mon, 6 May 2019 02:58:15 -0500 Subject: [neutron] Unable to configure multiple external networks In-Reply-To: References: Message-ID: It started to work , after i modified this code: def _fetch_external_net_id(self, force=False):         """Find UUID of single external network for this agent."""         self.conf.gateway_external_network_id = ''         #if self.conf.gateway_external_network_id:         #    return self.conf.gateway_external_network_id         return self.conf.gateway_external_network_id from https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py Looks like, that respective option is not being read correctly from the respective configuration file. Regards, Lohit On May 6, 2019, 2:28 AM -0500, valleru at cbio.mskcc.org, wrote: > Hello All, > > I am trying to install Openstack Stein on a single node, with multiple external networks (both networks are also shared). > However, i keep getting the following error in the logs, and the router interfaces show as down. > > 2019-05-06 02:19:45.046 52175 ERROR neutron.agent.l3.agent > 2019-05-06 02:19:45.048 52175 INFO neutron.agent.l3.agent [-] Starting router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3, priority 2 > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a2ec6c99-944e-408a-945a-dffbe09f65ce: Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Traceback (most recent call last): > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 701, in _process_routers_if_compatible > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     self._process_router_if_compatible(router) > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 548, in _process_router_if_compatible > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     target_ex_net_id = self._fetch_external_net_id() > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent   File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _fetch_external_net_id > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent     raise Exception(msg) > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent > 2019-05-06 02:19:46.252 52175 WARNING neutron.agent.l3.agent [-] Hit retry limit with router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3 > 2019-05-06 02:19:46.253 52175 WARNING neutron.agent.l3.agent [-] Info for router a2ec6c99-944e-408a-945a-dffbe09f65ce was not found. Performing router cleanup > > > I have set these parameters to empty, as mentioned in the docs. > > /etc/neutron/l3_agent.ini > > gateway_external_network_id = > external_network_bridge = > interface_driver = openvswitch > > I tried linuxbridge-agent too,but i could not get rid of the above error. > >  openstack port list --router router1 > > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ > | ID                                   | Name | MAC Address       | Fixed IP Addresses                                                         | Status | > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ > | 1bcaad17-17ed-4383-9206-34417f8fd2df |      | fa:16:3e:c1:b1:1f | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | DOWN   | > | f49d976f-b733-4360-9d1f-cdd35ecf54e6 |      | fa:16:3e:54:82:4b | ip_address='10.0.10.11', subnet_id='7cc01a33-f078-494d-9b0b-e988f5b4915d'  | DOWN   | > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+————+ > > However it does work when i have just one external network > >  openstack port list --router router1 > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > | ID                                   | Name | MAC Address       | Fixed IP Addresses                                                             | Status | > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > | cdb06cf7-7492-4275-bd93-88a46b9769a8 |      | fa:16:3e:7c:ea:55 | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080'     | ACTIVE | > | fc9b06d7-d377-451b-9af5-07e1fab072dc |      | fa:16:3e:d0:6d:7c | ip_address='140.163.188.149', subnet_id='4a2bf30a-e7f8-44c1-8b08-4de01b2b1296' | ACTIVE | > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > > May i please know, how to get the above working. > I have seen multiple articles online that mention that this should be working, however i am unable to get this to work. > It is really important for us to have to have 2 external networks in the environment, and be able to route to both of them if possible. > > > Thank you, > Lohit > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ifatafekn at gmail.com Mon May 6 08:50:45 2019 From: ifatafekn at gmail.com (Ifat Afek) Date: Mon, 6 May 2019 11:50:45 +0300 Subject: [vitrage] No IRC meeting this week Message-ID: Hi, The IRC meeting this week is canceled, since most of Vitrage contributors will be on vacation. We will meet again on Wednesday, May 15. Thanks, Ifat -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at ericsson.com Mon May 6 09:14:00 2019 From: balazs.gibizer at ericsson.com (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Mon, 6 May 2019 09:14:00 +0000 Subject: [placement][ptg] Open Questions In-Reply-To: References: Message-ID: <1557134030.12068.0@smtp.office365.com> On Mon, May 6, 2019 at 3:23 AM, Chris Dent wrote: > > A few questions we failed to resolve during the PTG that we should > work out over the next couple of weeks. > > * There are two specs in progress related to more flexible ways to > filter traits: > > * any trait in allocation candidates > > https://protect2.fireeye.com/url?k=d24a660a-8ec044e0-d24a2691-0cc47ad93e32-6536d6dba76bfa81&u=https://review.opendev.org/#/c/649992/ > > * support mixing required traits with any traits > > https://protect2.fireeye.com/url?k=dbbffb10-8735d9fa-dbbfbb8b-0cc47ad93e32-545ba4b564785811&u=https://review.opendev.org/#/c/649368/ > > Do we have pending non-placement features which depend on the > above being completed? I got the impression during the > nova-placement xproj session that maybe they were, but it's not > clear. Anyone willing to state one way or another? From the first spec: "This is required for the case when a Neutron network maps to more than one physnets but the port's bandwidth request can be fulfilled from any physnet the port's network maps to." So yes there is a use case that can only be supported if placement supports any traits in a_c query. It is to support multisegment neutron networks with QoS minimum bandwidth rule and with more than one segment mapped to physnet. A reason we did not discussed it in detail is that the use case was downprioritized on my side. (see https://etherpad.openstack.org/p/ptg-train-xproj-nova-neutron L40) > > * We had several RFE stories already in progress, and have added a > few more during the PTG. We have not done much in the way of > prioritizing these. We certainly can't do them all. Here's a link > to the current RFE stories in the placement group (this includes > placement, osc-placement and os-*). > > https://storyboard.openstack.org/#!/worklist/594 > > I've made a simple list of those on an etherpad, please register > you +1 or -1 (or nothing) on each of those. Keep in mind that > there are several features in "Update nested provider support to > address train requirements" and that we've already committed to > them. Did you forget to paste the etherpad link? > > Please let me know what I've forgotten. > > -- > Chris Dent ٩◔̯◔۶ > https://protect2.fireeye.com/url?k=2e065f7d-728c7d97-2e061fe6-0cc47ad93e32-0c1780ffb89507f5&u=https://anticdent.org/ > freenode: cdent tw: @anticdent From emilien at redhat.com Mon May 6 09:27:05 2019 From: emilien at redhat.com (Emilien Macchi) Date: Mon, 6 May 2019 11:27:05 +0200 Subject: [tripleo] deprecating keepalived support Message-ID: We introduced Keepalived a long time ago when we wanted to manage virtual IPs (VIPs) on the Undercloud when SSL is enabled and also for an HA alternative to Pacemaker on the overcloud, The multi-node undercloud with more than once instance of Keepalived never got attraction (so VRRP hasn't been useful for us), and Pacemaker is the de-facto tool to control HA VIPs on the Overcloud. Therefore, let's continue to trim-down our services and deprecate Keepalived. https://blueprints.launchpad.net/tripleo/+spec/replace-keepalived-undercloud The creation of control plane IP & public host IP can be done with os-net-config, and the upgrade path is simple. I've been working on 2 patches: # Introduce tripleo-container-rpm role https://review.opendev.org/#/c/657279/ Deprecate tripleo-docker-rm and add a generic role which supports both Docker & Podman. In the case of Podman, we cleanup the systemd services and container. # Deprecate Keepalived https://review.opendev.org/#/c/657067/ Remove Keepalived from all the roles, deprecate the service, tear-down Keepalived from the HAproxy service (if it was running), and use os-net-config to configure the interfaces previously managed by Keepalived service. I've tested the upgrade and it seems to work fine: https://asciinema.org/a/MpKBYU1PFvXcYai7aHUwy79LK Please let us know any concern and we'll address it. Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at ericsson.com Mon May 6 09:33:36 2019 From: balazs.gibizer at ericsson.com (=?utf-8?B?QmFsw6F6cyBHaWJpemVy?=) Date: Mon, 6 May 2019 09:33:36 +0000 Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: References: Message-ID: <1557135206.12068.1@smtp.office365.com> On Mon, May 6, 2019 at 1:54 AM, Chris Dent wrote: > > We had a brief conversation in the placement room yesterday > (Saturday May 5th) to confirm we were all on the same page with > regard to consumer types. These provide a way to say that a set of > allocations "is an instance" or "is a migration" and will help with > quota accounting. > > We decided that since no one has stepped forward with a more > complicated scenario, at this time, we will go with the simplest > implementation, for now: > > * add a consumer types table that has a key and string (length to be > determined, values controlled by clients) that represents a "type". > For example (1, 'instance') > > * add a column on consumer table that takes one of those keys > > * create a new row in the types table only when a new type is > created, don't worry about expiring them > > * provide an online migration to default existing consumers to > 'instance' and treat unset types as 'instance' [1]. This probably > needs some confirmation from mel and others that it is suitable. > If not, please provide an alternative suggestion. If there are ongoing migration then defaulting the consumer type to instance might be incorrect. However nova already has a mechanism to distingush between migration and instance consumer so nova won't break by this. Still nova might want to fix this placement data inconsistency. I guess the new placement microversion will allow to update the consumer type of an allocation. Cheers, gibi > * In a new microversion: allow queries to /usages to use a consumer > type parameter to limit results to particular types and add > 'consumer_type' key will be added to the body of an 'allocations' > in both PUT and POST. > > * We did not discuss in the room, but the email thread [2] did: We > may need to consider grouping /usages results by type but we could > probably get by without changing that (and do multiple requests, > sometimes). > > Surya, thank her very much, has volunteered to work on this and has > started a spec at [3]. > > We have decided, again due to lack of expressed demand, to do any > work (at this time) related to resource provider partitioning [4]. > > There's a pretty good idea on how to do this, but enough other stuff > going on there's not time. Because we decided in that thread that > any one resource provider can only be in one partition, there is > also a very easy workaround: Run another placement server. It takes > only a few minutes to set one up [5] > > This means that all of the client services of a single placement > service need to coordinate on what consumer types they are using. > (This was already true, but stated here for emphasis.) > > [1] I'm tempted to test how long a million or so rows of consumers > would take to update. If it is short enough we may wish to break > with the nova tradition of not doing data migrations in schema > migrations (placement-manage db sync). But we didn't get a chance to > discuss that in the room. > > [2] > http://lists.openstack.org/pipermail/openstack-discuss/2019-April/thread.html#4720 > > [3] > https://protect2.fireeye.com/url?k=e2926c01-be19673d-e2922c9a-86ef624f95b6-55a34a8e4a7579ba&u=https://review.opendev.org/#/c/654799/ > > [4] > http://lists.openstack.org/pipermail/openstack-discuss/2019-April/004721.html > > [5] https://docs.openstack.org/placement/latest/install/from-pypi.html > > -- > Chris Dent ٩◔̯◔۶ > https://protect2.fireeye.com/url?k=b585a35f-e90ea863-b585e3c4-86ef624f95b6-8f691958e6e41ae2&u=https://anticdent.org/ > freenode: cdent tw: @anticdent From skaplons at redhat.com Mon May 6 09:49:37 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Mon, 6 May 2019 11:49:37 +0200 Subject: [neutron] Unable to configure multiple external networks In-Reply-To: References: Message-ID: Hi, It is known and already reported issue. Please see https://bugs.launchpad.net/neutron/+bug/1824571 > On 6 May 2019, at 09:58, valleru at cbio.mskcc.org wrote: > > It started to work , after i modified this code: > > def _fetch_external_net_id(self, force=False): > """Find UUID of single external network for this agent.""" > self.conf.gateway_external_network_id = '' > #if self.conf.gateway_external_network_id: > # return self.conf.gateway_external_network_id > return self.conf.gateway_external_network_id > > from https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py > > Looks like, that respective option is not being read correctly from the respective configuration file. > > Regards, > Lohit > > On May 6, 2019, 2:28 AM -0500, valleru at cbio.mskcc.org, wrote: >> Hello All, >> >> I am trying to install Openstack Stein on a single node, with multiple external networks (both networks are also shared). >> However, i keep getting the following error in the logs, and the router interfaces show as down. >> >> 2019-05-06 02:19:45.046 52175 ERROR neutron.agent.l3.agent >> 2019-05-06 02:19:45.048 52175 INFO neutron.agent.l3.agent [-] Starting router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3, priority 2 >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a2ec6c99-944e-408a-945a-dffbe09f65ce: Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Traceback (most recent call last): >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 701, in _process_routers_if_compatible >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 548, in _process_router_if_compatible >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent target_ex_net_id = self._fetch_external_net_id() >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _fetch_external_net_id >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent raise Exception(msg) >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. >> 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent >> 2019-05-06 02:19:46.252 52175 WARNING neutron.agent.l3.agent [-] Hit retry limit with router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3 >> 2019-05-06 02:19:46.253 52175 WARNING neutron.agent.l3.agent [-] Info for router a2ec6c99-944e-408a-945a-dffbe09f65ce was not found. Performing router cleanup >> >> >> I have set these parameters to empty, as mentioned in the docs. >> >> /etc/neutron/l3_agent.ini >> >> gateway_external_network_id = >> external_network_bridge = >> interface_driver = openvswitch >> >> I tried linuxbridge-agent too,but i could not get rid of the above error. >> >> openstack port list --router router1 >> >> +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ >> | ID | Name | MAC Address | Fixed IP Addresses | Status | >> +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ >> | 1bcaad17-17ed-4383-9206-34417f8fd2df | | fa:16:3e:c1:b1:1f | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | DOWN | >> | f49d976f-b733-4360-9d1f-cdd35ecf54e6 | | fa:16:3e:54:82:4b | ip_address='10.0.10.11', subnet_id='7cc01a33-f078-494d-9b0b-e988f5b4915d' | DOWN | >> +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+————+ >> >> However it does work when i have just one external network >> >> openstack port list --router router1 >> +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ >> | ID | Name | MAC Address | Fixed IP Addresses | Status | >> +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ >> | cdb06cf7-7492-4275-bd93-88a46b9769a8 | | fa:16:3e:7c:ea:55 | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | ACTIVE | >> | fc9b06d7-d377-451b-9af5-07e1fab072dc | | fa:16:3e:d0:6d:7c | ip_address='140.163.188.149', subnet_id='4a2bf30a-e7f8-44c1-8b08-4de01b2b1296' | ACTIVE | >> +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ >> >> May i please know, how to get the above working. >> I have seen multiple articles online that mention that this should be working, however i am unable to get this to work. >> It is really important for us to have to have 2 external networks in the environment, and be able to route to both of them if possible. >> >> >> Thank you, >> Lohit >> — Slawek Kaplonski Senior software engineer Red Hat From doka.ua at gmx.com Mon May 6 10:50:26 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 6 May 2019 13:50:26 +0300 Subject: [octavia] Amphora agent returned unexpected result code 500 In-Reply-To: <5798b929-737e-fd29-a2a5-7c1246a632bb@gmx.com> References: <5798b929-737e-fd29-a2a5-7c1246a632bb@gmx.com> Message-ID: Hi, I did some additional tests (out of Octavia, reproducing Octavia's model) to check whether granted roles are enough. Seems, enough: 1) I have "customer" project with plenty of VM's connected to project's local (not shared) network (b24d2...) and subnet (24b10...), which is supposed to be vip-subnet: # openstack subnet show 24b10... project_id: ec62f... 2) I have "octavia" project, where users octavia, nova and neutron have "admin" role 3) under user "octavia" I create port in project "octavia", connected to "customer"s subnet and bind it to VM: octavia at octavia$ openstack port create --network b24d2... --fixed-ip subnet=24b10... --disable-port-security tport port id: 1c883... project_id: 41a02... octavia at octavia$ openstack server create --image cirros-0.4 --flavor B1 --nic port-id=1c883... tserv project_id: 41a02... 4) finally, I able to ping test server from customer project's VMs, despite the fact they're in different projects So it seems that roles to reproduce Octavia's model are enough and Openstack configured in the right way. On 5/6/19 12:34 AM, Volodymyr Litovka wrote: > Dear colleagues, > > trying to launch Amphorae, getting the following error in logs: > > Amphora agent returned unexpected result code 500 with response > {'message': 'Error plugging VIP', 'details': 'SIOCADDRT: Network is > unreachable\nFailed to bring up eth1.\n'} > > While details below, questions are here: > - whether it's enough to assign roles as explained below to special > project for Octavia? > - whether it can be issue with image, created by diskimage_create.sh? > - any recommendation on where to search for the problem. > > Thank you. > > My environment is: > - Openstack Rocky > - Octavia 4.0 > - amphora instance runs in special project "octavia", where users > octavia, nova and neutron have admin role > - amphora image prepared using original git repo process and elements > without modification: > * git clone > * cd octavia > * diskimage-create/diskimage-create.sh > * openstack image create [ ... ] --tag amphora > > After created, amphora instance successfully connects to management > network and can be accessed by controller: > > 2019-05-05 20:46:06.851 18234 DEBUG > octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to > amphora. Response: request > /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:486 > 2019-05-05 20:46:06.852 18234 DEBUG > octavia.controller.worker.tasks.amphora_driver_tasks [-] Successfuly > connected to amphora 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5: > {'ipvsadm_version': '1:1.28-3', 'api_version': '0.5', > 'haproxy_version': '1.6.3-1ubuntu0.2', 'hostname': > 'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', 'keepalived_version': > '1:1.2.24-1ubuntu0.16.04.1'} execute > /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/amphora_driver_tasks.py:372 > [ ... ] > 2019-05-05 20:46:06.990 18234 DEBUG > octavia.controller.worker.tasks.network_tasks [-] Plumbing VIP for > amphora id: 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 execute > /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/network_tasks.py:382 > 2019-05-05 20:46:07.003 18234 DEBUG > octavia.network.drivers.neutron.base [-] Neutron extension > security-group found enabled _check_extension_enabled > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2019-05-05 20:46:07.013 18234 DEBUG > octavia.network.drivers.neutron.base [-] Neutron extension > dns-integration found enabled _check_extension_enabled > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2019-05-05 20:46:07.025 18234 DEBUG > octavia.network.drivers.neutron.base [-] Neutron extension qos found > enabled _check_extension_enabled > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2019-05-05 20:46:07.044 18234 DEBUG > octavia.network.drivers.neutron.base [-] Neutron extension > allowed-address-pairs found enabled _check_extension_enabled > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/base.py:66 > 2019-05-05 20:46:08.406 18234 DEBUG > octavia.network.drivers.neutron.allowed_address_pairs [-] Created vip > port: b0398cc8-6d52-4f12-9f1f-1141b0f10751 for amphora: > 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 _plug_amphora_vip > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py:97 > [ ... ] > 2019-05-05 20:46:15.405 18234 DEBUG > octavia.network.drivers.neutron.allowed_address_pairs [-] Retrieving > network details for amphora 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 > _get_amp_net_configs > /opt/openstack/lib/python3.6/site-packages/octavia/network/drivers/neutron/allowed_address_pairs.py:596 > [ ... ] > 2019-05-05 20:46:15.837 18234 DEBUG > octavia.amphorae.drivers.haproxy.rest_api_driver [-] Post-VIP-Plugging > with vrrp_ip 10.0.2.13 vrrp_port b0398cc8-6d52-4f12-9f1f-1141b0f10751 > post_vip_plug > /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:233 > 2019-05-05 20:46:15.838 18234 DEBUG > octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url > plug/vip/10.0.2.24 request > /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:462 > 2019-05-05 20:46:15.838 18234 DEBUG > octavia.amphorae.drivers.haproxy.rest_api_driver [-] request url > https://172.16.252.35:9443/0.5/plug/vip/10.0.2.24 request > /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:465 > 2019-05-05 20:46:16.089 18234 DEBUG > octavia.amphorae.drivers.haproxy.rest_api_driver [-] Connected to > amphora. Response: request > /opt/openstack/lib/python3.6/site-packages/octavia/amphorae/drivers/haproxy/rest_api_driver.py:486 > 2019-05-05 20:46:16.090 18234 ERROR > octavia.amphorae.drivers.haproxy.exceptions [-] Amphora agent returned > unexpected result code 500 with response {'message': 'Error plugging > VIP', 'details': 'SIOCADDRT: Network is unreachable\nFailed to bring > up eth1.\n'} > > During the process, NEUTRON logs contains the following records that > indicate the following (note "status=DOWN" in neutron-dhcp-agent; > later immediately before to be deleted, it will shed 'ACTIVE'): > > May  5 20:46:13 ardbeg neutron-dhcp-agent: 2019-05-05 20:46:13.857 > 1804 INFO neutron.agent.dhcp.agent > [req-07833602-9579-403b-a264-76fd3ee408ee > a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - - > -] Trigger reload_allocations for port admin_state_up=True, > allowed_address_pairs=[{u'ip_address': u'10.0.2.24', u'mac_address': > u'72:d0:1c:4c:94:91'}], binding:host_id=ardbeg, binding:profile=, > binding:vif_details=datapath_type=system, ovs_hybrid_plug=False, > port_filter=True, binding:vif_type=ovs, binding:vnic_type=normal, > created_at=2019-05-05T20:46:07Z, description=, > device_id=f1bce6e9-be5b-464b-8f64-686f36e9de1f, > device_owner=compute:nova, dns_assignment=[{u'hostname': > u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', u'ip_address': > u'10.0.2.13', u'fqdn': > u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5.loqal.'}], dns_domain=, > dns_name=amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, > extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': > u'24b10886-3d53-4aee-bdc6-f165b242ae4f', u'ip_address': > u'10.0.2.13'}], id=b0398cc8-6d52-4f12-9f1f-1141b0f10751, > mac_address=72:d0:1c:4c:94:91, > name=octavia-lb-vrrp-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, > network_id=b24d2830-eec6-4abd-82f2-ac71c8ecbf40, > port_security_enabled=True, > project_id=41a02a69918849509f4102b04f8a7de9, qos_policy_id=None, > revision_number=5, > security_groups=[u'6df53a15-6afc-4c99-b464-03de4f546b4f'], > status=DOWN, tags=[], tenant_id=41a02a69918849509f4102b04f8a7de9, > updated_at=2019-05-05T20:46:13Z > May  5 20:46:14 ardbeg neutron-openvswitch-agent: 2019-05-05 > 20:46:14.185 31542 INFO > neutron.plugins.ml2.drivers.openvswitch.agent.ovs_neutron_agent > [req-a4425cdb-afc1-4f6a-9ef9-c8706e3285d6 - - - - -] Port > b0398cc8-6d52-4f12-9f1f-1141b0f10751 updated. Details: {'profile': {}, > 'network_qos_policy_id': None, 'qos_policy_id': None, > 'allowed_address_pairs': [{'ip_address': > AuthenticIPNetwork('10.0.2.24'), 'mac_address': > EUI('72:d0:1c:4c:94:91')}], 'admin_state_up': True, 'network_id': > 'b24d2830-eec6-4abd-82f2-ac71c8ecbf40', 'segmentation_id': 437, > 'fixed_ips': [{'subnet_id': '24b10886-3d53-4aee-bdc6-f165b242ae4f', > 'ip_address': '10.0.2.13'}], 'device_owner': u'compute:nova', > 'physical_network': None, 'mac_address': '72:d0:1c:4c:94:91', > 'device': u'b0398cc8-6d52-4f12-9f1f-1141b0f10751', > 'port_security_enabled': True, 'port_id': > 'b0398cc8-6d52-4f12-9f1f-1141b0f10751', 'network_type': u'vxlan', > 'security_groups': [u'6df53a15-6afc-4c99-b464-03de4f546b4f']} > May  5 20:46:14 ardbeg neutron-openvswitch-agent: 2019-05-05 > 20:46:14.197 31542 INFO neutron.agent.securitygroups_rpc > [req-a4425cdb-afc1-4f6a-9ef9-c8706e3285d6 - - - - -] Preparing filters > for devices set([u'b0398cc8-6d52-4f12-9f1f-1141b0f10751']) > > Note Nova returns response 200/completed: > > May  5 20:46:14 controller-l neutron-server: 2019-05-05 20:46:14.326 > 20981 INFO neutron.notifiers.nova [-] Nova event response: {u'status': > u'completed', u'tag': u'b0398cc8-6d52-4f12-9f1f-1141b0f10751', > u'name': u'network-changed', u'server_uuid': > u'f1bce6e9-be5b-464b-8f64-686f36e9de1f', u'code': 200} > > and "openstack server show" shows both NICs are attached to the amphorae: > > $ openstack server show f1bce6e9-be5b-464b-8f64-686f36e9de1f > +-------------------------------------+------------------------------------------------------------+ > | Field                               | Value                                                      | > +-------------------------------------+------------------------------------------------------------+ > [ ... ] > | addresses                           | octavia-net=172.16.252.35; u1000-p1000-xbone=10.0.2.13     | > +-------------------------------------+------------------------------------------------------------+ > > Later Octavia worker reports the following: > > 2019-05-05 20:46:16.124 18234 DEBUG > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-octavia-amp-post-vip-plug' > (f105ced1-72c6-4116-b582-599a21cdee36) transitioned into state > 'REVERTING' from state 'FAILURE' _task_receiver > /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 > 2019-05-05 20:46:16.127 18234 WARNING > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-octavia-amp-post-vip-plug' > (f105ced1-72c6-4116-b582-599a21cdee36) transitioned into state > 'REVERTED' from state 'REVERTING' with result 'None' > 2019-05-05 20:46:16.141 18234 DEBUG > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-reload-amp-after-plug-vip' > (c4d6222e-2508-4a9c-9514-e7f9bcf84e31) transitioned into state > 'REVERTING' from state 'SUCCESS' _task_receiver > /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 > 2019-05-05 20:46:16.142 18234 WARNING > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-reload-amp-after-plug-vip' > (c4d6222e-2508-4a9c-9514-e7f9bcf84e31) transitioned into state > 'REVERTED' from state 'REVERTING' with result 'None' > 2019-05-05 20:46:16.146 18234 DEBUG > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-ocatvia-amp-update-vip-data' > (2e1d1a04-282d-43b7-8c4f-fe31e75804ea) transitioned into state > 'REVERTING' from state 'SUCCESS' _task_receiver > /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 > 2019-05-05 20:46:16.148 18234 WARNING > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-ocatvia-amp-update-vip-data' > (2e1d1a04-282d-43b7-8c4f-fe31e75804ea) transitioned into state > 'REVERTED' from state 'REVERTING' with result 'None' > 2019-05-05 20:46:16.173 18234 DEBUG > octavia.controller.worker.controller_worker [-] Task > 'STANDALONE-octavia-plug-net-subflow-octavia-amp-plug-vip' > (c63a5bed-f531-4ed3-83d2-bce72e835932) transitioned into state > 'REVERTING' from state 'SUCCESS' _task_receiver > /opt/openstack/lib/python3.6/site-packages/taskflow/listeners/logging.py:194 > 2019-05-05 20:46:16.174 18234 WARNING > octavia.controller.worker.tasks.network_tasks [-] Unable to plug VIP > for amphora id 5bec4c09-a209-4e73-a66e-e4fc0fb8ded5 load balancer id > e01c6ff5-179a-4ed5-ae5d-1d00d6c584b8 > > and Neutron then deletes port but NOTE that immediately before > deletion port reported by neutron-dhcp-agent as ACTIVE: > > May  5 20:46:17 ardbeg neutron-dhcp-agent: 2019-05-05 20:46:17.080 > 1804 INFO neutron.agent.dhcp.agent > [req-835e5b91-28e5-44b9-a463-d04a0323294f > a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - - > -] Trigger reload_allocations for port admin_state_up=True, > allowed_address_pairs=[], binding:host_id=ardbeg, binding:profile=, > binding:vif_details=datapath_type=system, ovs_hybrid_plug=False, > port_filter=True, binding:vif_type=ovs, binding:vnic_type=normal, > created_at=2019-05-05T20:46:07Z, description=, > device_id=f1bce6e9-be5b-464b-8f64-686f36e9de1f, > device_owner=compute:nova, dns_assignment=[{u'hostname': > u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5', u'ip_address': > u'10.0.2.13', u'fqdn': > u'amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5.loqal.'}], dns_domain=, > dns_name=amphora-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, > extra_dhcp_opts=[], fixed_ips=[{u'subnet_id': > u'24b10886-3d53-4aee-bdc6-f165b242ae4f', u'ip_address': > u'10.0.2.13'}], id=b0398cc8-6d52-4f12-9f1f-1141b0f10751, > mac_address=72:d0:1c:4c:94:91, > name=octavia-lb-vrrp-5bec4c09-a209-4e73-a66e-e4fc0fb8ded5, > network_id=b24d2830-eec6-4abd-82f2-ac71c8ecbf40, > port_security_enabled=True, > project_id=41a02a69918849509f4102b04f8a7de9, qos_policy_id=None, > revision_number=8, > security_groups=[u'ba20352e-95b9-4c97-a688-59d44e3aa8cf'], > status=ACTIVE, tags=[], tenant_id=41a02a69918849509f4102b04f8a7de9, > updated_at=2019-05-05T20:46:16Z > May  5 20:46:17 controller-l neutron-server: 2019-05-05 20:46:17.086 > 20981 INFO neutron.wsgi [req-835e5b91-28e5-44b9-a463-d04a0323294f > a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - > default default] 10.0.10.31 "PUT > /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: > 200  len: 1395 time: 0.6318841 > May  5 20:46:17 controller-l neutron-server: 2019-05-05 20:46:17.153 > 20981 INFO neutron.wsgi [req-37ee0da3-8dcc-4fb8-9cd3-91c5a8dcedef > a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - > default default] 10.0.10.31 "GET > /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: > 200  len: 1395 time: 0.0616651 > May  5 20:46:18 controller-l neutron-server: 2019-05-05 20:46:18.179 > 20981 INFO neutron.wsgi [req-8896542e-5dcb-4e6d-9379-04cd88c4035b > a18f38c780074c6280dde5edad159666 41a02a69918849509f4102b04f8a7de9 - > default default] 10.0.10.31 "DELETE > /v2.0/ports/b0398cc8-6d52-4f12-9f1f-1141b0f10751 HTTP/1.1" status: > 204  len: 149 time: 1.0199890 > > Thank you. > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Mon May 6 11:47:34 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 6 May 2019 07:47:34 -0400 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast In-Reply-To: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> References: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> Message-ID: <20190506114734.mehzyjf7dhj6mqkr@bishop> I think this is a really great approach. +1 Nate On Sun, May 05, 2019 at 02:18:08AM -0500, Ghanshyam Mann wrote: > Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We tried > to improve it via filtering the slow tests in the separate tempest-slow job but the situation has not been improved much. > > We talked about the Ideas to make it more stable and fast for projects especially when failure is not > related to each project. We are planning to split the integrated-gate template (only tempest-full job as > first step) per related services. > > Idea: > - Run only dependent service tests on project gate. > - Tempest gate will keep running all the services tests as the integrated gate at a centeralized place without any change in the current job. > - Each project can run the below mentioned template. > - All below template will be defined and maintained by QA team. > > I would like to know each 6 services which run integrated-gate jobs > > 1."Integrated-gate-networking" (job to run on neutron gate) > Tests to run in this template: neutron APIs , nova APIs, keystone APIs ? All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for neutron gate: exlcude the cinder API tests, glance API tests, swift API tests, > > 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for cinder, glance gate: excluded the neutron APIs tests, Keystone APIs tests > > 3. "Integrated-gate-object-storage" (job to run on swift gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for swift gate: excluded the neutron APIs tests, - Keystone APIs tests, - Nova APIs tests. > Note: swift does not run integrated-gate as of now. > > 4. "Integrated-gate-compute" (job to run on Nova gate) > tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and All scenario currently running in tempest-full in same way ( means non-slow and in serial) > Improvement for Nova gate: excluded the swift APIs tests(not running in current job but in future, it might), Keystone API tests. > > 5. "Integrated-gate-identity" (job to run on keystone gate) > Tests to run is : all as all project use keystone, we might need to run all tests as it is running in integrated-gate. > But does keystone is being unsed differently by all services? if no then, is it enough to run only single service tests say Nova or neutron ? > > 6. "Integrated-gate-placement" (job to run on placement gate) > Tests to run in this template: Nova APIs tests, Neutron APIs tests + scenario tests + any new service depends on placement APIs > Improvement for placement gate: excluded the glance APIs tests, cinder APIs tests, swift APIs tests, keystone APIs tests > > Thoughts on this approach? > > The important point is we must not lose the coverage of integrated testing per project. So I would like to > get each project view if we are missing any dependency (proposed tests removal) in above proposed templates. > > - https://etherpad.openstack.org/p/qa-train-ptg > > -gmann > > From pawel.konczalski at everyware.ch Mon May 6 12:01:31 2019 From: pawel.konczalski at everyware.ch (Pawel Konczalski) Date: Mon, 6 May 2019 14:01:31 +0200 Subject: OpenStack Kubernetes uninitialized taint on minion nodes Message-ID: <76abf981-543b-1742-2ab3-5423ba93b0d0@everyware.ch> Hi, i try to deploy a Kubernetes cluster with OpenStack Magnum. So far the deployment works fine except for the uninitialized taints attribute on the worker / minion nodes. This has to be removed manually, only after that is it possible to deploy containers in the cluster. Any idea how to fix / automate this that Magnum automaticaly deploy functional cluster? openstack coe cluster template create kubernetes-cluster-template \   --image Fedora-AtomicHost-29-20190429.0.x86_64 \   --external-network public \   --dns-nameserver 8.8.8.8 \   --master-flavor m1.kubernetes \   --flavor m1.kubernetes \   --coe kubernetes \   --volume-driver cinder \   --network-driver flannel \   --docker-volume-size 25 openstack coe cluster create kubernetes-cluster \   --cluster-template kubernetes-cluster-template \   --master-count 1 \   --node-count 2 \   --keypair mykey kubectl describe nodes | grep Taints [fedora at kubernetes-cluster9-efikj2wr5lsi-master-0 ~]$ kubectl describe nodes | grep Taints Taints:             CriticalAddonsOnly=True:NoSchedule Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule Taints: node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule kubectl taint nodes --all node.cloudprovider.kubernetes.io/uninitialized- [root at kubernetes-cluster31-vrmbz6yjvuvd-master-0 /]# kubectl describe nodes | grep Taints Taints:             dedicated=master:NoSchedule Taints:             Taints:             BR Pawel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: not available URL: From lyarwood at redhat.com Mon May 6 13:18:34 2019 From: lyarwood at redhat.com (Lee Yarwood) Date: Mon, 6 May 2019 14:18:34 +0100 Subject: [nova][cinder][ptg] Summary: Swap volume woes Message-ID: <20190506131834.nyc7k7qltdsmamuq@lyarwood.usersys.redhat.com> Hello, tl;dr - No objections to reworking the swap volume API in Train https://etherpad.openstack.org/p/ptg-train-xproj-nova-cinder - L3-18 Summary: - Deprecate the existing swap volume API in Train, remove in U. - Deprecate or straight up remove existing CLI support for the API. - Write up a spec introducing a new API specifically for use by Cinder when retyping or migrating volumes. Potentially using the external events API or policy to lock down access to the API. - Optionally rework the Libvirt virt driver implementation of the API to improve performance and better handle failure cases as suggested by mdbooth. This might include introducing and using a quiesce volume API. I'm personally out for the next two weeks but will start on the above items once back. Cheers, -- Lee Yarwood A5D1 9385 88CB 7E5F BE64 6618 BCA6 6E33 F672 2D76 From valleru at cbio.mskcc.org Mon May 6 13:39:56 2019 From: valleru at cbio.mskcc.org (valleru at cbio.mskcc.org) Date: Mon, 6 May 2019 08:39:56 -0500 Subject: [neutron] Unable to configure multiple external networks In-Reply-To: References: Message-ID: <2b4e0900-dd5b-47a7-a383-dbe0884653a9@Spark> Thank you Slawek, Yes - I see that it is a reported bug. Will keep a track. Regards, Lohit On May 6, 2019, 4:51 AM -0500, Slawomir Kaplonski , wrote: > Hi, > > It is known and already reported issue. Please see https://bugs.launchpad.net/neutron/+bug/1824571 > > > On 6 May 2019, at 09:58, valleru at cbio.mskcc.org wrote: > > > > It started to work , after i modified this code: > > > > def _fetch_external_net_id(self, force=False): > > """Find UUID of single external network for this agent.""" > > self.conf.gateway_external_network_id = '' > > #if self.conf.gateway_external_network_id: > > # return self.conf.gateway_external_network_id > > return self.conf.gateway_external_network_id > > > > from https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py > > > > Looks like, that respective option is not being read correctly from the respective configuration file. > > > > Regards, > > Lohit > > > > On May 6, 2019, 2:28 AM -0500, valleru at cbio.mskcc.org, wrote: > > > Hello All, > > > > > > I am trying to install Openstack Stein on a single node, with multiple external networks (both networks are also shared). > > > However, i keep getting the following error in the logs, and the router interfaces show as down. > > > > > > 2019-05-06 02:19:45.046 52175 ERROR neutron.agent.l3.agent > > > 2019-05-06 02:19:45.048 52175 INFO neutron.agent.l3.agent [-] Starting router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3, priority 2 > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent [-] Failed to process compatible router: a2ec6c99-944e-408a-945a-dffbe09f65ce: Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Traceback (most recent call last): > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 701, in _process_routers_if_compatible > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router) > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 548, in _process_router_if_compatible > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent target_ex_net_id = self._fetch_external_net_id() > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 376, in _fetch_external_net_id > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent raise Exception(msg) > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent Exception: The 'gateway_external_network_id' option must be configured for this agent as Neutron has more than one external network. > > > 2019-05-06 02:19:46.249 52175 ERROR neutron.agent.l3.agent > > > 2019-05-06 02:19:46.252 52175 WARNING neutron.agent.l3.agent [-] Hit retry limit with router update for a2ec6c99-944e-408a-945a-dffbe09f65ce, action 3 > > > 2019-05-06 02:19:46.253 52175 WARNING neutron.agent.l3.agent [-] Info for router a2ec6c99-944e-408a-945a-dffbe09f65ce was not found. Performing router cleanup > > > > > > > > > I have set these parameters to empty, as mentioned in the docs. > > > > > > /etc/neutron/l3_agent.ini > > > > > > gateway_external_network_id = > > > external_network_bridge = > > > interface_driver = openvswitch > > > > > > I tried linuxbridge-agent too,but i could not get rid of the above error. > > > > > > openstack port list --router router1 > > > > > > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ > > > | ID | Name | MAC Address | Fixed IP Addresses | Status | > > > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+--------+ > > > | 1bcaad17-17ed-4383-9206-34417f8fd2df | | fa:16:3e:c1:b1:1f | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | DOWN | > > > | f49d976f-b733-4360-9d1f-cdd35ecf54e6 | | fa:16:3e:54:82:4b | ip_address='10.0.10.11', subnet_id='7cc01a33-f078-494d-9b0b-e988f5b4915d' | DOWN | > > > +--------------------------------------+------+-------------------+----------------------------------------------------------------------------+————+ > > > > > > However it does work when i have just one external network > > > > > > openstack port list --router router1 > > > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > > > | ID | Name | MAC Address | Fixed IP Addresses | Status | > > > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > > > | cdb06cf7-7492-4275-bd93-88a46b9769a8 | | fa:16:3e:7c:ea:55 | ip_address='192.168.1.1', subnet_id='b00cb3bf-ca89-4e00-8bd7-83a75dbb6080' | ACTIVE | > > > | fc9b06d7-d377-451b-9af5-07e1fab072dc | | fa:16:3e:d0:6d:7c | ip_address='140.163.188.149', subnet_id='4a2bf30a-e7f8-44c1-8b08-4de01b2b1296' | ACTIVE | > > > +--------------------------------------+------+-------------------+--------------------------------------------------------------------------------+--------+ > > > > > > May i please know, how to get the above working. > > > I have seen multiple articles online that mention that this should be working, however i am unable to get this to work. > > > It is really important for us to have to have 2 external networks in the environment, and be able to route to both of them if possible. > > > > > > > > > Thank you, > > > Lohit > > > > > — > Slawek Kaplonski > Senior software engineer > Red Hat > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dms at danplanet.com Mon May 6 15:04:35 2019 From: dms at danplanet.com (Dan Smith) Date: Mon, 06 May 2019 08:04:35 -0700 Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: <1557135206.12068.1@smtp.office365.com> (=?utf-8?Q?=22Bal?= =?utf-8?Q?=C3=A1zs?= Gibizer"'s message of "Mon, 6 May 2019 09:33:36 +0000") References: <1557135206.12068.1@smtp.office365.com> Message-ID: > If there are ongoing migration then defaulting the consumer type to > instance might be incorrect. Right, and you have to assume that there are some in progress. Only Nova has the ability to tell you which consumers are instances or migrations. If we did this before we split, we could have looked at the api db instance mappings to make the determination, but I think now we need to be told via the API which is which. > However nova already has a mechanism to distingush between migration > and instance consumer so nova won't break by this. This would mean placement just lies about what each consumer is, and an operator trying to make sense of an upgrade by dumping stuff with osc-placement won't be able to tell the difference. They might be inclined to delete what, to them, would look like a bunch of stale instance allocations. > Still nova might want to fix this placement data inconsistency. I > guess the new placement microversion will allow to update the consumer > type of an allocation. Yeah, I think this has to be updated from Nova. I (and I imagine others) would like to avoid making the type field optional in the API. So maybe default the value to something like "incomplete" or "unknown" and then let nova correct this naturally for instances on host startup and migrations on complete/revert. Ideally nova will be one one of the users that wants to depend on the type string, so we want to use our knowledge of which is which to get existing allocations updated so we can depend on the type value later. --Dan From pierre at stackhpc.com Mon May 6 15:32:37 2019 From: pierre at stackhpc.com (Pierre Riteau) Date: Mon, 6 May 2019 16:32:37 +0100 Subject: [blazar] Scheduling a new Blazar IRC meeting for the Americas In-Reply-To: References: Message-ID: Hello, The new IRC meeting for the Blazar project has been approved: https://review.opendev.org/#/c/656392/ We will meet this Thursday (May 9th) at 1600 UTC, then every two weeks. Everyone is welcome to join. On Tue, 9 Apr 2019 at 16:50, Pierre Riteau wrote: > > Hello, > > Contributors to the Blazar project are currently mostly from Europe or > Asia. Our weekly IRC meeting at 0900 UTC is a good match for this > group. > > To foster more contributions, I would like to schedule another IRC > meeting in the morning for American timezones, probably every two > weeks to start with. > I am thinking of proposing 1600 UTC on either Monday, Tuesday, or > Thursday, which doesn't appear to conflict with closely related > projects (Nova, Placement, Ironic). > > If there is anyone who would like to join but cannot make this time, > or has a preference on which day, please let me know. > I will wait for a few days before requesting a meeting slot. > > Thanks, > Pierre From cdent+os at anticdent.org Mon May 6 15:46:17 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 6 May 2019 08:46:17 -0700 (PDT) Subject: [placement][ptg] Open Questions In-Reply-To: <1557134030.12068.0@smtp.office365.com> References: <1557134030.12068.0@smtp.office365.com> Message-ID: On Mon, 6 May 2019, Balázs Gibizer wrote: >> * We had several RFE stories already in progress, and have added a >> few more during the PTG. We have not done much in the way of >> prioritizing these. We certainly can't do them all. Here's a link >> to the current RFE stories in the placement group (this includes >> placement, osc-placement and os-*). >> >> https://storyboard.openstack.org/#!/worklist/594 >> >> I've made a simple list of those on an etherpad, please register >> you +1 or -1 (or nothing) on each of those. Keep in mind that >> there are several features in "Update nested provider support to >> address train requirements" and that we've already committed to >> them. > > Did you forget to paste the etherpad link? Whoops, sorry about that. Clearly there have been some long days: https://etherpad.openstack.org/p/placement-ptg-train-rfe-voter Thanks for noticing. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From cdent+os at anticdent.org Mon May 6 15:49:24 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Mon, 6 May 2019 08:49:24 -0700 (PDT) Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: References: <1557135206.12068.1@smtp.office365.com> Message-ID: On Mon, 6 May 2019, Dan Smith wrote: >> Still nova might want to fix this placement data inconsistency. I >> guess the new placement microversion will allow to update the consumer >> type of an allocation. > > Yeah, I think this has to be updated from Nova. I (and I imagine others) > would like to avoid making the type field optional in the API. So maybe > default the value to something like "incomplete" or "unknown" and then > let nova correct this naturally for instances on host startup and > migrations on complete/revert. Ideally nova will be one one of the users > that wants to depend on the type string, so we want to use our knowledge > of which is which to get existing allocations updated so we can depend > on the type value later. Ah, okay, good. If something like "unknown" is workable I think that's much much better than defaulting to instance. Thanks. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From doka.ua at gmx.com Mon May 6 15:54:51 2019 From: doka.ua at gmx.com (Volodymyr Litovka) Date: Mon, 6 May 2019 18:54:51 +0300 Subject: [octavia] Error while creating amphora In-Reply-To: References: Message-ID: <0994c2fb-a2c1-89f8-10ca-c3d0d9bf79e2@gmx.com> Hi Michael, regarding file injection vs config_drive - https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/deprecate-file-injection.html - don't know when this will happen, but you see - people are thinking in this way. On 5/2/19 5:58 PM, Michael Johnson wrote: > Volodymyr, > > It looks like you have enabled "user_data_config_drive" in the > octavia.conf file. Is there a reason you need this? If not, please > set it to False and it will resolve your issue. > > It appears we have a python3 bug in the "user_data_config_drive" > capability. It is not generally used and appears to be missing test > coverage. > > I have opened a story (bug) on your behalf here: > https://storyboard.openstack.org/#!/story/2005553 > > Michael > > On Thu, May 2, 2019 at 4:29 AM Volodymyr Litovka wrote: >> Dear colleagues, >> >> I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: >> >> DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 >> ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes >> ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute >> ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config >> ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render >> ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render >> ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception >> ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise >> ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code >> ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent >> ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method >> ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes >> ERROR octavia.controller.worker.tasks.compute_tasks >> WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' >> >> Any advises where is the problem? >> >> My environment: >> - Openstack Rocky >> - Ubuntu 18.04 >> - Octavia installed in virtualenv using pip install: >> # pip list |grep octavia >> octavia 4.0.0 >> octavia-lib 1.1.1 >> python-octaviaclient 1.8.0 >> >> Thank you. >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison >> >> -- >> Volodymyr Litovka >> "Vision without Execution is Hallucination." -- Thomas Edison -- Volodymyr Litovka "Vision without Execution is Hallucination." -- Thomas Edison From miguel at mlavalle.com Mon May 6 15:59:15 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Mon, 6 May 2019 10:59:15 -0500 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast In-Reply-To: <20190506114734.mehzyjf7dhj6mqkr@bishop> References: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> <20190506114734.mehzyjf7dhj6mqkr@bishop> Message-ID: Yes, I also like this approach On Mon, May 6, 2019 at 6:48 AM Nate Johnston wrote: > I think this is a really great approach. +1 > > Nate > > On Sun, May 05, 2019 at 02:18:08AM -0500, Ghanshyam Mann wrote: > > Current integrated-gate jobs (tempest-full) is not so stable for various > bugs specially timeout. We tried > > to improve it via filtering the slow tests in the separate tempest-slow > job but the situation has not been improved much. > > > > We talked about the Ideas to make it more stable and fast for projects > especially when failure is not > > related to each project. We are planning to split the integrated-gate > template (only tempest-full job as > > first step) per related services. > > > > Idea: > > - Run only dependent service tests on project gate. > > - Tempest gate will keep running all the services tests as the > integrated gate at a centeralized place without any change in the current > job. > > - Each project can run the below mentioned template. > > - All below template will be defined and maintained by QA team. > > > > I would like to know each 6 services which run integrated-gate jobs > > > > 1."Integrated-gate-networking" (job to run on neutron gate) > > Tests to run in this template: neutron APIs , nova APIs, keystone APIs > ? All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > > Improvement for neutron gate: exlcude the cinder API tests, glance API > tests, swift API tests, > > > > 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) > > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, > Nova APIs and All scenario currently running in tempest-full in the same > way ( means non-slow and in serial) > > Improvement for cinder, glance gate: excluded the neutron APIs tests, > Keystone APIs tests > > > > 3. "Integrated-gate-object-storage" (job to run on swift gate) > > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and > All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > > Improvement for swift gate: excluded the neutron APIs tests, - Keystone > APIs tests, - Nova APIs tests. > > Note: swift does not run integrated-gate as of now. > > > > 4. "Integrated-gate-compute" (job to run on Nova gate) > > tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs > and All scenario currently running in tempest-full in same way ( means > non-slow and in serial) > > Improvement for Nova gate: excluded the swift APIs tests(not running in > current job but in future, it might), Keystone API tests. > > > > 5. "Integrated-gate-identity" (job to run on keystone gate) > > Tests to run is : all as all project use keystone, we might need to run > all tests as it is running in integrated-gate. > > But does keystone is being unsed differently by all services? if no > then, is it enough to run only single service tests say Nova or neutron ? > > > > 6. "Integrated-gate-placement" (job to run on placement gate) > > Tests to run in this template: Nova APIs tests, Neutron APIs tests + > scenario tests + any new service depends on placement APIs > > Improvement for placement gate: excluded the glance APIs tests, cinder > APIs tests, swift APIs tests, keystone APIs tests > > > > Thoughts on this approach? > > > > The important point is we must not lose the coverage of integrated > testing per project. So I would like to > > get each project view if we are missing any dependency (proposed tests > removal) in above proposed templates. > > > > - https://etherpad.openstack.org/p/qa-train-ptg > > > > -gmann > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paye600 at gmail.com Mon May 6 16:50:58 2019 From: paye600 at gmail.com (Roman Gorshunov) Date: Mon, 6 May 2019 18:50:58 +0200 Subject: [tc][all][airship] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: <20190503230525.a3vxsnliklitnei4@arabian.linksys.moosehall> References: <20190503190538.GB3377@localhost.localdomain> <20190503230525.a3vxsnliklitnei4@arabian.linksys.moosehall> Message-ID: Thanks, Adam. I haven't been on PTG, sorry. It's good that there has been a discussion and agreement is reached. Best regards, -- Roman Gorshunov On Sat, May 4, 2019 at 1:05 AM Adam Spiers wrote: > > Paul Belanger wrote: > >On Fri, May 03, 2019 at 08:48:10PM +0200, Roman Gorshunov wrote: > >>Hello Jim, team, > >> > >>I'm from Airship project. I agree with archival of Github mirrors of > >>repositories. > > Which mirror repositories are you referring to here - a subset of the > Airship repos which are no longer needed, or all Airship repo mirrors? > > I would prefer the majority of the mirrors not to be archived, for two > reasons which Alan or maybe Matt noted in the Airship discussions this > morning: > > 1. Some people instinctively go to GitHub search when they > want to find a software project. Having useful search results > for "airship" on GitHub increases the discoverability of the > project. > > 2. Some people will judge the liveness of a project by its > activity metrics as shown on GitHub (e.g. number of recent > commits). An active mirror helps show that the project is > alive and well. In contrast, an archived mirror makes it look > like the project is dead. > > However if you are only talking about a small subset which are no > longer needed, then archiving sounds reasonable. > > >>One small suggestion: could we have project descriptions > >>adjusted to point to the new location of the source code repository, > >>please? E.g. "The repo now lives at opendev.org/x/y". > > I agree it's helpful if the top-level README.rst has a sentence like > "the authoritative location for this repo is https://...". > > >This is something important to keep in mind from infra side, once the > >repo is read-only, we lose the ability to use the API to change it. > > > >From manage-projects.py POV, we can update the description before > >flipping the archive bit without issues, just need to make sure we have > >the ordering correct. > > > >Also, there is no API to unarchive a repo from github sadly, for that a > >human needs to log into github UI and click the button. I have no idea > >why. > > Good points, but unless we're talking about a small subset of Airship > repos, I'm a bit puzzled why this is being discussed, because I > thought we reached consensus this morning on a) ensuring that all > Airship projects are continually mirrored to GitHub, and b) trying to > transfer those mirrors from the "openstack" organization to the > "airship" one, assuming we can first persuade GitHub to kick out the > org-squatters. This transferral would mean that GitHub would > automatically redirect requests from > > https://github.com/openstack/airship-* > > to > > https://github.com/airship/... > > Consensus is documented in lines 107-112 of: > > https://etherpad.openstack.org/p/airship-ptg-train From snikitin at mirantis.com Mon May 6 16:59:33 2019 From: snikitin at mirantis.com (Sergey Nikitin) Date: Mon, 6 May 2019 20:59:33 +0400 Subject: [stackalytics] Reported numbers seem inaccurate In-Reply-To: References: Message-ID: Hello Rong, Sorry for long response. I was on a trip during last 5 days. What I have found: Lets take a look on this patch [1]. It must be a contribution of gengchc2, but for some reasons it was matched to Yuval Brik [2] I'm still trying to find a root cause of it, but anyway on this week we are planing to rebuild our database to increase RAM. I checked statistics of gengchc2 on clean database and it's complete correct. So your problem will be solved in several days. It will take so long time because full rebuild of DB takes 48 hours, but we need to test our migration process first to keep zero down time. I'll share a results with you here when the process will be finished. Thank you for your patience. Sergey [1] https://review.opendev.org/#/c/627762/ [2] https://www.stackalytics.com/?user_id=jhamhader&project_type=all&release=all&metric=commits&company=&module=freezer-api On Mon, May 6, 2019 at 6:30 AM Rong Zhu wrote: > Hi Sergey, > > Do we have any process about my colleague's data loss problem? > > Sergey Nikitin 于2019年4月29日 周一19:57写道: > >> Thank you for information! I will take a look >> >> On Mon, Apr 29, 2019 at 3:47 PM Rong Zhu wrote: >> >>> Hi there, >>> >>> Recently we found we lost a person's data from our company at the >>> stackalytics website. >>> You can check the merged patch from [0], but there no date from >>> the stackalytics website. >>> >>> stackalytics info as below: >>> Company: ZTE Corporation >>> Launchpad: 578043796-b >>> Gerrit: gengchc2 >>> >>> Look forward to hearing from you! >>> >> > Best Regards, > Rong Zhu > >> >>> -- > Thanks, > Rong Zhu > -- Best Regards, Sergey Nikitin -------------- next part -------------- An HTML attachment was scrubbed... URL: From miguel at mlavalle.com Mon May 6 17:12:34 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Mon, 6 May 2019 12:12:34 -0500 Subject: [openstack-dev] [neutron] Cancelling Neutron weekly meeting on May 7th Message-ID: Dear Neutron Team, Since we just meet during the PTG, we will skip the weekly team meeting on May 7th. We will resume our meetings on the 13th Best regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Mon May 6 18:03:27 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 May 2019 13:03:27 -0500 Subject: [nova][ptg] Summary: Implicit trait-based filters Message-ID: Summary: In keeping with the first proposed cycle theme [1] (though we didn't land on that until later in the PTG), we would like to be able to add required traits to the GET /allocation_candidates query to reduce the number of results returned - i.e. do more filtering in placement rather than in the scheduler (or worse, the compute). You can already do this by explicitly adding required traits to flavor/image; we want to be able to do it implicitly based on things like: - If the instance requires multiattach, make sure it lands on a compute that supports multiattach [2]. - If the image is in X format, make sure it lands on a compute that can read X format [3]. Currently the proposals in [2],[3] work by modifying the RequestSpec.flavor right before select_destinations calls GET /allocation_candidates. This just happens to be okay because we don't persist that copy of the flavor back to the instance (which we wouldn't want to do, since we don't want these implicit additions to e.g. show up when we GET server details, or to affect other lifecycle operations). But this isn't a robust design. What we would like to do instead is exploit the RequestSpec.requested_resources field [4] as it was originally intended, accumulating all the resource/trait/aggregate/etc. criteria from the flavor, image, *and* request_filter-y things like the above. However, gibi started on this [5] and it turns out to be difficult to express the unnumbered request group in that field for... reasons. Action: Since gibi is going to be pretty occupied and unlikely to have time to resolve [5], aspiers has graciously (been) volunteered to take it over; and then follow [2] and [3] to use that mechanism once it's available. efried [1] https://review.opendev.org/#/c/657171/1/priorities/train-priorities.rst at 13 [2] https://review.opendev.org/#/c/645316/ [3] https://review.opendev.org/#/q/topic:bp/request-filter-image-types+(status:open+OR+status:merged) [4] https://opendev.org/openstack/nova/src/commit/5934c5dc6932fbf19ca7f3011c4ccc07b0038ac4/nova/objects/request_spec.py#L93-L100 [5] https://review.opendev.org/#/c/647396/ From ashlee at openstack.org Mon May 6 18:20:00 2019 From: ashlee at openstack.org (Ashlee Ferguson) Date: Mon, 6 May 2019 13:20:00 -0500 Subject: Shanghai Summit Programming Committee Nominations Open Message-ID: <2631F356-5352-41BF-AD86-DF2AB17F349C@openstack.org> Thank you to everyone who attended the Open Infrastructure Summit in Denver. The event was a huge success! If you weren’t able to make it, check out the videos page [1]. Keynotes are up now, and the rest of the sessions will be uploaded in the next week. We’ll also be sharing a Summit recap in the Open Infrastructure Community Newsletter, which you can subscribe to here [2]. The next Summit + PTG will be in Shanghai, November 4 - 6, and the PTG will be November 6 - 8, 2019. Registration and Programming Committee nominations for the Shanghai Open Infrastructure Summit + PTG [3] are open! The Programming Committee helps select the content from the Call for Presentations (CFP) for the Summit schedule. Sessions will be presented in both English and Mandarin, so we will be accepting CFP submissions in both languages. The CFP will open early next week. • Nominate yourself or someone else for the Programming Committee [4] before May 20, 2019 • Shanghai Summit + PTG registration is available in the following currencies: • Register in USD [5] • Register in RMB (includes fapiao) [6] Thanks, Ashlee [1] https://www.openstack.org/videos [2] https://www.openstack.org/community/email-signup [3] https://www.openstack.org/summit/shanghai-2019 [4] http://bit.ly/ShanghaiProgrammingCommittee [5] https://app.eventxtra.link/registrations/6640a923-98d7-44c7-a623-1e2c9132b402?locale=en [6] https://app.eventxtra.link/registrations/f564960c-74f6-452d-b0b2-484386d33eb6?locale=en From openstack at fried.cc Mon May 6 18:44:15 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 May 2019 13:44:15 -0500 Subject: [nova][ptg] Summary: Implicit trait-based filters In-Reply-To: Message-ID: Addendum: There's another implicit trait-based filter that bears mentioning: Excluding disabled compute hosts. We have code that disables a compute service when "something goes wrong" in various ways. This code should decorate the compute node's resource provider with a COMPUTE_SERVICE_DISABLED trait, and every GET /allocation_candidates request should include ?required=!COMPUTE_SERVICE_DISABLED, so that we don't retrieve allocation candidates for disabled hosts. mriedem has started to prototype the code for this [1]. Action: Spec to be written. Code to be polished up. Possibly aspiers to be involved in this bit as well. efried [1] https://review.opendev.org/#/c/654596/ From jungleboyj at gmail.com Mon May 6 19:19:36 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Mon, 6 May 2019 14:19:36 -0500 Subject: [cinder] No weekly meeting this week ... Message-ID: Team, It was discussed at the PTG last week that we would take this week off from our usual team meeting. So, enjoy getting an hour back on Wednesday and we will go back to our regularly scheduled meetings on May 15th . Thanks! Jay From openstack at fried.cc Mon May 6 19:32:10 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 May 2019 14:32:10 -0500 Subject: [nova][ptg] Summary: Server group [anti-]affinity Message-ID: <27bb593b-62c3-9167-59de-d7e6effab9e9@fried.cc> The Issue: Doing server group affinity ("land all these instances on the same host") and anti-affinity ("...on different hosts") on the nova side is problematic in large deployments (like CERN). We'd like to do it on the placement side - i.e. have GET /allocation_candidates return [just the one host to which servers in the group are being deployed] (affinity); or [only hosts on which servers in the group have not yet landed] (anti-affinity). Summary: - Affinity is fairly easy: ?in_tree=. - For anti-affinity, we need something like ?in_tree=!. - The size of in the latter case could quickly get out of hand, exceeding HTTP/wsgi (querystring length) and/or database query (`AND resource_provider.uuid NOT IN `) limits. - Race conditions in both cases are a little tricky. - tssurya to come up with spec(s) for ?in_tree=! and nova usage thereof, wherein discussions of the above issues can occur. - Unclear if this will make Train. efried . From sean.mcginnis at gmx.com Mon May 6 20:08:13 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Mon, 6 May 2019 15:08:13 -0500 Subject: [cinder] Third party CI failures with namespace changes Message-ID: <20190506200813.GA29759@sm-workstation> Just a heads up for third party CI maintainers. You should have already noticed this, but we have quite a few that are failing tests right now because the CI systems have not been updated for the new git namespaces. There are several I noticed that are failure trying to clone https://git.openstack.org/openstack-dev/devstack. With all of the changes a couple weeks ago, this should now be from https://opendev.org/openstack/devstack. Please update your CI's to pull from the correct location, or disable them for now until you are able to make the updates. The current barrage of CI failure comments soon after submitting patches are not particularly helpful. Thanks for your prompt attention to this. As a reminder, we have a third party CI policy that impacts in-tree drivers if third party CI's are not maintained and cannot give us useful feedback as to whether a driver is functional or not [0]. Thanks, Sean [0] https://wiki.openstack.org/wiki/Cinder/tested-3rdParty-drivers#Non-Compliance_Policy From openstack at fried.cc Mon May 6 20:12:25 2019 From: openstack at fried.cc (Eric Fried) Date: Mon, 6 May 2019 15:12:25 -0500 Subject: [nova][ptg] Summary: Tech Debt Message-ID: Tech debt items we discussed, and actions to be taken thereon: - Remove cellsv1: (continue to) do it [1] - Remove nova-network: do it - Remove the nova-console, nova-consoleauth, nova-xvpxvncproxy services: do it - Migrating rootwrap to privsep: (continue to) do it [2] - Bump the minimum microversion: don't do it - Remove mox: (continue to) do it [3] It's possible I missed some; if so, please reply. efried [1] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/remove-cells-v1 [2] https://review.opendev.org/#/q/project:openstack/nova+branch:master+topic:my-own-personal-alternative-universe [3] https://review.opendev.org/#/q/status:open+project:openstack/nova+branch:master+topic:bp/mox-removal-train From alifshit at redhat.com Mon May 6 20:31:18 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Mon, 6 May 2019 16:31:18 -0400 Subject: [nova][ptg] Summary: Server group [anti-]affinity In-Reply-To: <27bb593b-62c3-9167-59de-d7e6effab9e9@fried.cc> References: <27bb593b-62c3-9167-59de-d7e6effab9e9@fried.cc> Message-ID: On Mon, May 6, 2019 at 3:35 PM Eric Fried wrote: > - Race conditions in both cases are a little tricky. So we currently have the late group policy check [1] done on the host itself during instance build. It'd be great if we can get rid of the need for it with this work, or at the very least make it very very unlikely to fail. I know it won't be easy though. [1] https://github.com/openstack/nova/blob/eae1f2257a9ee6e851182bf949568b1cfe2af763/nova/compute/manager.py#L1350 > - tssurya to come up with spec(s) for ?in_tree=! and nova usage > thereof, wherein discussions of the above issues can occur. > - Unclear if this will make Train. > > efried > . > > -- Artom Lifshitz Software Engineer, OpenStack Compute DFG From sundar.nadathur at intel.com Mon May 6 21:17:41 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Mon, 6 May 2019 21:17:41 +0000 Subject: [cyborg] [ptg] PTG summary Message-ID: <1CC272501B5BC543A05DB90AA509DED5275553E7@fmsmsx122.amr.corp.intel.com> Thanks to all Cyborg, Nova and Ironic developers for the productive PTG. * Cyborg-Nova integration: Demo slides are here [1]. The xproj etherpad with summary of outcomes is [2], and includes the demo slides link at the bottom. (The Cyborg PTG etherpad [3] also contains the link to the slides in Line 26.) Good to see that Nova has made Cyborg integration as a cycle theme [4]. * Major goals for Train release [5] were unanimously agreed upon. * Mapping names to IDs: All agreed on the need. The discussion on how to do that needs to be completed. By using alternative mechanisms for function IDs and region IDs, we could potentially avoid the need for a new API. * Good discussions on a variety of other topics, as can be seen in [3], but they need follow-up. * Owners identified for most of the ToDo tasks [3]. * In offline conversations after the PTG, ZTE developers have agreed to help with getting tempest CI started, to be followed up by others later. * Cyborg-Ironic cross-project [6]: Good discussion. The need for the integration was understood: between the types of bare metal servers, and varying number/types of accelerators, there is a combinatorial explosion of the number of variations; Cyborg can help address that. Need to write a spec for the approach. Next steps: * Cyborg/Nova integration: o Drive Nova spec to closure, write and merge some Cyborg specs (device profiles, REST API), merge Cyborg pilot code into master, incorporate some feedback in Nova patches. o Set up tempest CI, with a real or fake device. o Only after both steps above are done will Nova patches get merged. * Need to write a bunch of specs: Cyborg (REST API spec, driver API spec?), DDP-related specs, Ironic+Cyborg spec. * Complete the discussions on remaining items. [1] https://docs.google.com/presentation/d/1uHP2kVLiZVuRcdeCI8QaNDJILjix9VCO7wagTBuWGyQ/edit?usp=sharing [2] https://etherpad.openstack.org/p/ptg-train-xproj-nova-cyborg [3] https://etherpad.openstack.org/p/cyborg-ptg-train [4] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005867.html [5] https://etherpad.openstack.org/p/cyborg-train-goals [6] https://etherpad.openstack.org/p/ptg-train-xproj-ironic-cyborg Regards, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Mon May 6 21:56:50 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Mon, 6 May 2019 17:56:50 -0400 Subject: [ops][nova]Logging in nova and other openstack projects Message-ID: Hi, We’ve been modifying our login habits for Nova on our Openstack setup to try to send only warning level and up logs to our log servers. To do so, I’ve created a logging.conf and configured logging according to the logging module documentation. While what I’ve done works, it seems to be a very convoluted process for something as simple as changing the logging level to warning. We worry that if we upgrade and the syntax for this configuration file changes, we may have to push more changes through ansible than we would like to. Is there an easier way to set the nova logs to warning level and up than making an additional config file for the python logging module? Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From morgan.fainberg at gmail.com Mon May 6 22:06:23 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Mon, 6 May 2019 15:06:23 -0700 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast In-Reply-To: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> References: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> Message-ID: On Sun, May 5, 2019 at 12:19 AM Ghanshyam Mann wrote: > Current integrated-gate jobs (tempest-full) is not so stable for various > bugs specially timeout. We tried > to improve it via filtering the slow tests in the separate tempest-slow > job but the situation has not been improved much. > > We talked about the Ideas to make it more stable and fast for projects > especially when failure is not > related to each project. We are planning to split the integrated-gate > template (only tempest-full job as > first step) per related services. > > Idea: > - Run only dependent service tests on project gate. > - Tempest gate will keep running all the services tests as the integrated > gate at a centeralized place without any change in the current job. > - Each project can run the below mentioned template. > - All below template will be defined and maintained by QA team. > > I would like to know each 6 services which run integrated-gate jobs > > 1."Integrated-gate-networking" (job to run on neutron gate) > Tests to run in this template: neutron APIs , nova APIs, keystone APIs ? > All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > Improvement for neutron gate: exlcude the cinder API tests, glance API > tests, swift API tests, > > 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova > APIs and All scenario currently running in tempest-full in the same way ( > means non-slow and in serial) > Improvement for cinder, glance gate: excluded the neutron APIs tests, > Keystone APIs tests > > 3. "Integrated-gate-object-storage" (job to run on swift gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and > All scenario currently running in tempest-full in the same way ( means > non-slow and in serial) > Improvement for swift gate: excluded the neutron APIs tests, - Keystone > APIs tests, - Nova APIs tests. > Note: swift does not run integrated-gate as of now. > > 4. "Integrated-gate-compute" (job to run on Nova gate) > tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and > All scenario currently running in tempest-full in same way ( means non-slow > and in serial) > Improvement for Nova gate: excluded the swift APIs tests(not running in > current job but in future, it might), Keystone API tests. > > 5. "Integrated-gate-identity" (job to run on keystone gate) > Tests to run is : all as all project use keystone, we might need to run > all tests as it is running in integrated-gate. > But does keystone is being unsed differently by all services? if no then, > is it enough to run only single service tests say Nova or neutron ? > > 6. "Integrated-gate-placement" (job to run on placement gate) > Tests to run in this template: Nova APIs tests, Neutron APIs tests + > scenario tests + any new service depends on placement APIs > Improvement for placement gate: excluded the glance APIs tests, cinder > APIs tests, swift APIs tests, keystone APIs tests > > Thoughts on this approach? > > The important point is we must not lose the coverage of integrated testing > per project. So I would like to > get each project view if we are missing any dependency (proposed tests > removal) in above proposed templates. > > - https://etherpad.openstack.org/p/qa-train-ptg > > -gmann > > > For the "Integrated-gate-identity", I have a slight worry that we might lose some coverage with this change. I am unsure of how varied the use of Keystone is outside of KeystoneMiddleware (i.e. token validation) consumption that all services perform, Heat (not part of the integrated gate) and it's usage of Trusts, and some newer emerging uses such as "look up limit data" (potentially in Train, would be covered by Nova). Worst case, we could run all the integrated tests for Keystone changes (at least initially) until we have higher confidence and minimize the tests once we have a clearer audit of how the services use Keystone. The changes would speed up/minimize the usage for the other services directly and Keystone can follow down the line. I want to be as close to 100% sure we're not going to suddenly break everyone because of some change we land. Keystone fortunately and unfortunately sits below most other services in an OpenStack deployment and is heavily relied throughout almost every single request. --Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim at swiftstack.com Mon May 6 23:25:11 2019 From: tim at swiftstack.com (Tim Burke) Date: Mon, 6 May 2019 16:25:11 -0700 Subject: [qa][ptg][nova][cinder][keystone][neutron][glance][swift][placement] How to make integrated-gate testing (tempest-full) more stable and fast In-Reply-To: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> References: <16a86db6ccd.d787148123989.2198391414179782565@ghanshyammann.com> Message-ID: On 5/5/19 12:18 AM, Ghanshyam Mann wrote: > Current integrated-gate jobs (tempest-full) is not so stable for various bugs specially timeout. We tried > to improve it via filtering the slow tests in the separate tempest-slow job but the situation has not been improved much. > > We talked about the Ideas to make it more stable and fast for projects especially when failure is not > related to each project. We are planning to split the integrated-gate template (only tempest-full job as > first step) per related services. > > Idea: > - Run only dependent service tests on project gate. I love this plan already. > - Tempest gate will keep running all the services tests as the integrated gate at a centeralized place without any change in the current job. > - Each project can run the below mentioned template. > - All below template will be defined and maintained by QA team. My biggest regret is that I couldn't figure out how to do this myself. Much thanks to the QA team! > > I would like to know each 6 services which run integrated-gate jobs > > 1."Integrated-gate-networking" (job to run on neutron gate) > Tests to run in this template: neutron APIs , nova APIs, keystone APIs ? All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for neutron gate: exlcude the cinder API tests, glance API tests, swift API tests, > > 2."Integrated-gate-storage" (job to run on cinder gate, glance gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs, Nova APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for cinder, glance gate: excluded the neutron APIs tests, Keystone APIs tests > > 3. "Integrated-gate-object-storage" (job to run on swift gate) > Tests to run in this template: Cinder APIs , Glance APIs, Swift APIs and All scenario currently running in tempest-full in the same way ( means non-slow and in serial) > Improvement for swift gate: excluded the neutron APIs tests, - Keystone APIs tests, - Nova APIs tests. This sounds great. My only question is why Cinder tests are still included, but I trust that it's there for a reason and I'm just revealing my own ignorance of Swift's consumers, however removed. > Note: swift does not run integrated-gate as of now. Correct, and for all the reasons that you're seeking to address. Some eight months ago I'd gotten tired of seeing spurious failures that had nothing to do with Swift, and I was hard pressed to find an instance where the tempest tests caught a regression or behavior change that wasn't already caught by Swift's own functional tests. In short, the signal-to-noise ratio for those particular tests was low enough that a failure only told me "you should leave a recheck comment," so I proposed https://review.opendev.org/#/c/601813/ . There was also a side benefit of having our longest-running job change from legacy-tempest-dsvm-neutron-full (at 90-100 minutes) to swift-probetests-centos-7 (at ~30 minutes), tightening developer feedback loops. It sounds like this proposal addresses both concerns: by reducing the scope of tests to what might actually exercise the Swift API (if indirectly), the signal-to-noise ratio should be much better and the wall-clock time will be reduced. > > 4. "Integrated-gate-compute" (job to run on Nova gate) > tests to run is : Nova APIs, Cinder APIs , Glance APIs ?, neutron APIs and All scenario currently running in tempest-full in same way ( means non-slow and in serial) > Improvement for Nova gate: excluded the swift APIs tests(not running in current job but in future, it might), Keystone API tests. > > 5. "Integrated-gate-identity" (job to run on keystone gate) > Tests to run is : all as all project use keystone, we might need to run all tests as it is running in integrated-gate. > But does keystone is being unsed differently by all services? if no then, is it enough to run only single service tests say Nova or neutron ? > > 6. "Integrated-gate-placement" (job to run on placement gate) > Tests to run in this template: Nova APIs tests, Neutron APIs tests + scenario tests + any new service depends on placement APIs > Improvement for placement gate: excluded the glance APIs tests, cinder APIs tests, swift APIs tests, keystone APIs tests > > Thoughts on this approach? > > The important point is we must not lose the coverage of integrated testing per project. So I would like to > get each project view if we are missing any dependency (proposed tests removal) in above proposed templates. As far as Swift is aware, these dependencies seem accurate; at any rate, *we* don't use anything other than Keystone, even by way of another API. Further, Swift does not use particularly esoteric Keysonte APIs; I would be OK with integrated-gate-identity not exercising Swift's API with the assumption that some other (or indeed, almost *any* other) service would likely exercise the parts that we care about. > > - https:/etherpad.openstack.org/p/qa-train-ptg > > -gmann > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nate.johnston at redhat.com Tue May 7 03:11:59 2019 From: nate.johnston at redhat.com (Nate Johnston) Date: Mon, 6 May 2019 23:11:59 -0400 Subject: [neutron] bug deputy notes 2019-04-29 - 2019-05-06 Message-ID: <20190507031159.iwvgpme7gdonwjh3@bishop> Neutrinos, It was a quiet week with the summit and PTG. All reported bugs have a fix in progress except for 1827363. High: - "snat gateway port may stay 4095 after router fully initialized in l3 agent" * https://bugs.launchpad.net/bugs/1827754 * Fix in progress - "Network won't be synced when create a new network node" * https://bugs.launchpad.net/bugs/1827771 * Fix in progress - "Additional port list / get_ports() failures when filtering and limiting at the same time" * https://bugs.launchpad.net/neutron/+bug/1827363 Low: - "Remove deprecated SR-IOV devstack file" * https://bugs.launchpad.net/neutron/+bug/1827089 * Fix merged - "Routed provider networks in neutron - placement CLI example" * https://bugs.launchpad.net/bugs/1827418 * Fix in progress - "Wrong IPV6 address provided by openstack server create" * https://bugs.launchpad.net/neutron/+bug/1827489 * Fix merged Thanks, Nate From marcin.juszkiewicz at linaro.org Tue May 7 06:42:09 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Tue, 7 May 2019 08:42:09 +0200 Subject: [kolla][neutron] Python3 issue: "TypeError: Unicode-objects must be encoded before hashing" Message-ID: <1d56ad05-9fa4-16b7-5cbe-af5c339f58b1@linaro.org> I am working on making Kolla images Python 3 only. So far images are py3 but then there are issues during deployment phase which I do not know how to solve. https://review.opendev.org/#/c/642375/ is a patch. 'kolla-ansible-ubuntu-source' CI job deploys using Ubuntu 18.04 based images. And fails. Log [1] shows something which looks like 'works in py2, not tested with py3' code: 1. http://logs.openstack.org/75/642375/19/check/kolla-ansible-ubuntu-source/40878ed/primary/logs/ansible/deploy "+++ neutron-db-manage --config-file /etc/neutron/neutron.conf --config-file /etc/neutron/plugins/ml2/ml2_conf.ini upgrade head", "INFO [alembic.runtime.migration] Context impl MySQLImpl.", "INFO [alembic.runtime.migration] Will assume non-transactional DDL.", "INFO [alembic.runtime.migration] Context impl MySQLImpl.", "INFO [alembic.runtime.migration] Will assume non-transactional DDL.", "INFO [alembic.runtime.migration] Running upgrade -> kilo", "INFO [alembic.runtime.migration] Running upgrade kilo -> 354db87e3225", "INFO [alembic.runtime.migration] Running upgrade 354db87e3225 -> 599c6a226151", "INFO [alembic.runtime.migration] Running upgrade 599c6a226151 -> 52c5312f6baf", "INFO [alembic.runtime.migration] Running upgrade 52c5312f6baf -> 313373c0ffee", "INFO [alembic.runtime.migration] Running upgrade 313373c0ffee -> 8675309a5c4f", "INFO [alembic.runtime.migration] Running upgrade 8675309a5c4f -> 45f955889773", "INFO [alembic.runtime.migration] Running upgrade 45f955889773 -> 26c371498592", "INFO [alembic.runtime.migration] Running upgrade 26c371498592 -> 1c844d1677f7", "INFO [alembic.runtime.migration] Running upgrade 1c844d1677f7 -> 1b4c6e320f79", "INFO [alembic.runtime.migration] Running upgrade 1b4c6e320f79 -> 48153cb5f051", "INFO [alembic.runtime.migration] Running upgrade 48153cb5f051 -> 9859ac9c136", "INFO [alembic.runtime.migration] Running upgrade 9859ac9c136 -> 34af2b5c5a59", "INFO [alembic.runtime.migration] Running upgrade 34af2b5c5a59 -> 59cb5b6cf4d", "INFO [alembic.runtime.migration] Running upgrade 59cb5b6cf4d -> 13cfb89f881a", "INFO [alembic.runtime.migration] Running upgrade 13cfb89f881a -> 32e5974ada25", "INFO [alembic.runtime.migration] Running upgrade 32e5974ada25 -> ec7fcfbf72ee", "INFO [alembic.runtime.migration] Running upgrade ec7fcfbf72ee -> dce3ec7a25c9", "INFO [alembic.runtime.migration] Running upgrade dce3ec7a25c9 -> c3a73f615e4", "INFO [alembic.runtime.migration] Running upgrade c3a73f615e4 -> 659bf3d90664", "INFO [alembic.runtime.migration] Running upgrade 659bf3d90664 -> 1df244e556f5", "INFO [alembic.runtime.migration] Running upgrade 1df244e556f5 -> 19f26505c74f", "INFO [alembic.runtime.migration] Running upgrade 19f26505c74f -> 15be73214821", "INFO [alembic.runtime.migration] Running upgrade 15be73214821 -> b4caf27aae4", "INFO [alembic.runtime.migration] Running upgrade b4caf27aae4 -> 15e43b934f81", "INFO [alembic.runtime.migration] Running upgrade 15e43b934f81 -> 31ed664953e6", "INFO [alembic.runtime.migration] Running upgrade 31ed664953e6 -> 2f9e956e7532", "INFO [alembic.runtime.migration] Running upgrade 2f9e956e7532 -> 3894bccad37f", "INFO [alembic.runtime.migration] Running upgrade 3894bccad37f -> 0e66c5227a8a", "INFO [alembic.runtime.migration] Running upgrade 0e66c5227a8a -> 45f8dd33480b", "INFO [alembic.runtime.migration] Running upgrade 45f8dd33480b -> 5abc0278ca73", "INFO [alembic.runtime.migration] Running upgrade 5abc0278ca73 -> d3435b514502", "INFO [alembic.runtime.migration] Running upgrade d3435b514502 -> 30107ab6a3ee", "INFO [alembic.runtime.migration] Running upgrade 30107ab6a3ee -> c415aab1c048", "INFO [alembic.runtime.migration] Running upgrade c415aab1c048 -> a963b38d82f4", "INFO [alembic.runtime.migration] Running upgrade kilo -> 30018084ec99", "INFO [alembic.runtime.migration] Running upgrade 30018084ec99 -> 4ffceebfada", "INFO [alembic.runtime.migration] Running upgrade 4ffceebfada -> 5498d17be016", "INFO [alembic.runtime.migration] Running upgrade 5498d17be016 -> 2a16083502f3", "INFO [alembic.runtime.migration] Running upgrade 2a16083502f3 -> 2e5352a0ad4d", "INFO [alembic.runtime.migration] Running upgrade 2e5352a0ad4d -> 11926bcfe72d", "INFO [alembic.runtime.migration] Running upgrade 11926bcfe72d -> 4af11ca47297", "INFO [alembic.runtime.migration] Running upgrade 4af11ca47297 -> 1b294093239c", "INFO [alembic.runtime.migration] Running upgrade 1b294093239c -> 8a6d8bdae39", "INFO [alembic.runtime.migration] Running upgrade 8a6d8bdae39 -> 2b4c2465d44b", "INFO [alembic.runtime.migration] Running upgrade 2b4c2465d44b -> e3278ee65050", "INFO [alembic.runtime.migration] Running upgrade e3278ee65050 -> c6c112992c9", "INFO [alembic.runtime.migration] Running upgrade c6c112992c9 -> 5ffceebfada", "INFO [alembic.runtime.migration] Running upgrade 5ffceebfada -> 4ffceebfcdc", "INFO [alembic.runtime.migration] Running upgrade 4ffceebfcdc -> 7bbb25278f53", "INFO [alembic.runtime.migration] Running upgrade 7bbb25278f53 -> 89ab9a816d70", "INFO [alembic.runtime.migration] Running upgrade 89ab9a816d70 -> c879c5e1ee90", "INFO [alembic.runtime.migration] Running upgrade c879c5e1ee90 -> 8fd3918ef6f4", "INFO [alembic.runtime.migration] Running upgrade 8fd3918ef6f4 -> 4bcd4df1f426", "INFO [alembic.runtime.migration] Running upgrade 4bcd4df1f426 -> b67e765a3524", "INFO [alembic.runtime.migration] Running upgrade a963b38d82f4 -> 3d0e74aa7d37", "INFO [alembic.runtime.migration] Running upgrade 3d0e74aa7d37 -> 030a959ceafa", "INFO [alembic.runtime.migration] Running upgrade 030a959ceafa -> a5648cfeeadf", "INFO [alembic.runtime.migration] Running upgrade a5648cfeeadf -> 0f5bef0f87d4", "INFO [alembic.runtime.migration] Running upgrade 0f5bef0f87d4 -> 67daae611b6e", "INFO [alembic.runtime.migration] Running upgrade 67daae611b6e -> 6b461a21bcfc", "INFO [alembic.runtime.migration] Running upgrade 6b461a21bcfc -> 5cd92597d11d", "INFO [alembic.runtime.migration] Running upgrade 5cd92597d11d -> 929c968efe70", "INFO [alembic.runtime.migration] Running upgrade 929c968efe70 -> a9c43481023c", "INFO [alembic.runtime.migration] Running upgrade a9c43481023c -> 804a3c76314c", "INFO [alembic.runtime.migration] Running upgrade 804a3c76314c -> 2b42d90729da", "INFO [alembic.runtime.migration] Running upgrade 2b42d90729da -> 62c781cb6192", "INFO [alembic.runtime.migration] Running upgrade 62c781cb6192 -> c8c222d42aa9", "INFO [alembic.runtime.migration] Running upgrade c8c222d42aa9 -> 349b6fd605a6", "INFO [alembic.runtime.migration] Running upgrade 349b6fd605a6 -> 7d32f979895f", "INFO [alembic.runtime.migration] Running upgrade 7d32f979895f -> 594422d373ee", "INFO [alembic.runtime.migration] Running upgrade 594422d373ee -> 61663558142c", "INFO [alembic.runtime.migration] Running upgrade 61663558142c -> 867d39095bf4, port forwarding", "INFO [alembic.runtime.migration] Running upgrade 867d39095bf4 -> d72db3e25539, modify uniq port forwarding", "INFO [alembic.runtime.migration] Running upgrade d72db3e25539 -> cada2437bf41", "INFO [alembic.runtime.migration] Running upgrade cada2437bf41 -> 195176fb410d, router gateway IP QoS", "INFO [alembic.runtime.migration] Running upgrade 195176fb410d -> fb0167bd9639", "INFO [alembic.runtime.migration] Running upgrade fb0167bd9639 -> 0ff9e3881597", "INFO [alembic.runtime.migration] Running upgrade 0ff9e3881597 -> 9bfad3f1e780", "INFO [alembic.runtime.migration] Running upgrade b67e765a3524 -> a84ccf28f06a", "INFO [alembic.runtime.migration] Running upgrade a84ccf28f06a -> 7d9d8eeec6ad", "INFO [alembic.runtime.migration] Running upgrade 7d9d8eeec6ad -> a8b517cff8ab", "INFO [alembic.runtime.migration] Running upgrade a8b517cff8ab -> 3b935b28e7a0", "INFO [alembic.runtime.migration] Running upgrade 3b935b28e7a0 -> b12a3ef66e62", "INFO [alembic.runtime.migration] Running upgrade b12a3ef66e62 -> 97c25b0d2353", "INFO [alembic.runtime.migration] Running upgrade 97c25b0d2353 -> 2e0d7a8a1586", "INFO [alembic.runtime.migration] Running upgrade 2e0d7a8a1586 -> 5c85685d616d", "INFO [alembic.runtime.migration] Context impl MySQLImpl.", "INFO [alembic.runtime.migration] Will assume non-transactional DDL.", "Traceback (most recent call last):", " File \"/var/lib/kolla/venv/bin/neutron-db-manage\", line 10, in ", " sys.exit(main())", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/migration/cli.py\", line 657, in main", " return_val |= bool(CONF.command.func(config, CONF.command.name))", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/migration/cli.py\", line 179, in do_upgrade", " run_sanity_checks(config, revision)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/migration/cli.py\", line 641, in run_sanity_checks", " script_dir.run_env()", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/base.py\", line 475, in run_env", " util.load_python_file(self.dir, \"env.py\")", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/util/pyfiles.py\", line 90, in load_python_file", " module = load_module_py(module_id, path)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/util/compat.py\", line 156, in load_module_py", " spec.loader.exec_module(module)", " File \"\", line 678, in exec_module", " File \"\", line 219, in _call_with_frames_removed", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/networking_infoblox/neutron/db/migration/alembic_migrations/env.py\", line 88, in ", " run_migrations_online()", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/networking_infoblox/neutron/db/migration/alembic_migrations/env.py\", line 79, in run_migrations_online", " context.run_migrations()", " File \"\", line 8, in run_migrations", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/runtime/environment.py\", line 839, in run_migrations", " self.get_context().run_migrations(**kw)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/runtime/migration.py\", line 350, in run_migrations", " for step in self._migrations_fn(heads, self):", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/neutron/db/migration/cli.py\", line 632, in check_sanity", " revision, rev, implicit_base=True):", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/revision.py\", line 767, in _iterate_revisions", " uppers = util.dedupe_tuple(self.get_revisions(upper))", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/revision.py\", line 321, in get_revisions", " resolved_id, branch_label = self._resolve_revision_number(id_)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/revision.py\", line 491, in _resolve_revision_number", " self._revision_map", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/util/langhelpers.py\", line 230, in __get__", " obj.__dict__[self.__name__] = result = self.fget(obj)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/revision.py\", line 123, in _revision_map", " for revision in self._generator():", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/base.py\", line 109, in _load_revisions", " script = Script._from_filename(self, vers, file_)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/script/base.py\", line 887, in _from_filename", " module = util.load_python_file(dir_, filename)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/util/pyfiles.py\", line 90, in load_python_file", " module = load_module_py(module_id, path)", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/alembic/util/compat.py\", line 156, in load_module_py", " spec.loader.exec_module(module)", " File \"\", line 678, in exec_module", " File \"\", line 219, in _call_with_frames_removed", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/networking_infoblox/neutron/db/migration/alembic_migrations/versions/4d0bb1d080f8_member_sync_improvement.py\", line 43, in ", " default=utils.get_hash()))", " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/networking_infoblox/neutron/common/utils.py\", line 374, in get_hash", " return hashlib.md5(str(time.time())).hexdigest()", "TypeError: Unicode-objects must be encoded before hashing" ], Any ideas which project goes wrong? And how/where to fix it? From balazs.gibizer at ericsson.com Tue May 7 07:19:55 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Tue, 7 May 2019 07:19:55 +0000 Subject: [nova][ptg] Summary: Implicit trait-based filters In-Reply-To: References: Message-ID: <1557213589.2232.0@smtp.office365.com> On Mon, May 6, 2019 at 8:03 PM, Eric Fried wrote: > Summary: > In keeping with the first proposed cycle theme [1] (though we didn't > land on that until later in the PTG), we would like to be able to add > required traits to the GET /allocation_candidates query to reduce the > number of results returned - i.e. do more filtering in placement > rather > than in the scheduler (or worse, the compute). You can already do this > by explicitly adding required traits to flavor/image; we want to be > able > to do it implicitly based on things like: > - If the instance requires multiattach, make sure it lands on a > compute > that supports multiattach [2]. > - If the image is in X format, make sure it lands on a compute that > can > read X format [3]. > > Currently the proposals in [2],[3] work by modifying the > RequestSpec.flavor right before select_destinations calls GET > /allocation_candidates. This just happens to be okay because we don't > persist that copy of the flavor back to the instance (which we > wouldn't > want to do, since we don't want these implicit additions to e.g. show > up > when we GET server details, or to affect other lifecycle operations). > > But this isn't a robust design. > > What we would like to do instead is exploit the > RequestSpec.requested_resources field [4] as it was originally > intended, > accumulating all the resource/trait/aggregate/etc. criteria from the > flavor, image, *and* request_filter-y things like the above. However, > gibi started on this [5] and it turns out to be difficult to express > the > unnumbered request group in that field for... reasons. Sorry that I was not able to describe the problems with the approach on the PTG. I will try now in a mail. So this patch [5] tries to create the unnumbered group in RequestSpec.requested_resources based on the other fields (flavor, image ..) in the RequestSpec early enough that the above mentioned pre-filters can add traits to this group instead of adding it the the flavor extra_spec. The current sequence is the following: * RequestSpec is created in three diffefent ways 1) RequestSpec.from_components(): used during server create. (and cold migrate if legacy compute is present) 2) RequestSpec.from_primitives(): deprecated but still used during re-schedule 3) RequestSpec.__init__(): oslo OVO deepcopy calls __init__ then copies over every field one by one. * Before nova scheduler sends the Placement a_c query it calls nova.scheduler.utils.resources_from_request_spec(RequestSpec) that code use the RequesSpec fields and collect all the request groups and all the other parameters (e.g. limit, group_policy) What we would need at the end: * When the RequetSpec is created in any way we need to populate the RequestSpec.requested_resources field based on the other RequestSpec fields. Note that __init__ cannot be used for this as all three instantiation of the object creates an empty object first with __init__ then pupulates the fields later one by one. * When any of the interesting fields (flavor, image, is_bvf, force_*, ...) is updated on the RequestSpec the request groups in RequestSpec.requested_resources needs to be updated to reflect the change. However we have to be careful not to blindly re-generate such data as the unnumbered group migh already contain traits that are not coming form any of these direct sources but coming from the above mentioned implicit required traits code paths. * When the Placement a_c query is generated it needs to be generated from RequestSpec.requested_resources There are couple of problems: 1) Detecting a change of a RequestSpec field cannot be done via wrapping the field in a propery due to OVO limitations [6]. Even if it would be possible the way we create the RequestSpec object (init an empty object then set fields one by one) the field setters might be called on an incomplete object. 2) Regeneration of RequestSpec.requested_resources would need to distinguish between data that can be regenerated from the other fields of the RequestSpec and the traits added from outside (implicit required traits). 3) The request pre-filters [7] run before the placement a_c query is generated. But these today changes the fields of the RequestSpec (e.g. requested_destination) that would mean the regeneration of RequestSpec.requested_resources would be needed. This probably solvable by changing the pre-filters to work directly on RequestSpec.requested_resources after we solved all the other issues. 4) The numbered request groups can come from multiple places. When it comes from the Flavor the number is stable as provided by the person created the Flavor. But when it comes from a Neutron port the number is generated (the next unoccupied int). So a re-generation of such groups would potentially re-numbed the groups. This makes the debuging hard as well as mapping numbered group back to the entity it requested the resource (port) after allocation. This probably solvable by using the proposed placement extension that allows a string in the numbered group name instead of just a single int. [8] This way the port uuid can be used as the identity for the numbered group to make the indenity stable. Cheers, gibi [6] https://bugs.launchpad.net/oslo.versionedobjects/+bug/1821619 [7] https://github.com/openstack/nova/blob/master/nova/scheduler/request_filter.py [8] https://storyboard.openstack.org/#!/story/2005575 > > Action: > Since gibi is going to be pretty occupied and unlikely to have time to > resolve [5], aspiers has graciously (been) volunteered to take it > over; > and then follow [2] and [3] to use that mechanism once it's available. Aspier, ping me if you want to talk about these in IRC. Cheers, gibi > > efried > > [1] > https://protect2.fireeye.com/url?k=07226944-5ba84bad-072229df-0cc47ad93e2e-db879b26751dd159&u=https://review.opendev.org/#/c/657171/1/priorities/train-priorities.rst at 13 > [2] > https://protect2.fireeye.com/url?k=6793d282-3b19f06b-67939219-0cc47ad93e2e-b61d4c15f019d018&u=https://review.opendev.org/#/c/645316/ > [3] > https://protect2.fireeye.com/url?k=975e0f6d-cbd42d84-975e4ff6-0cc47ad93e2e-9cf6144999db0dfb&u=https://review.opendev.org/#/q/topic:bp/request-filter-image-types+(status:open+OR+status:merged) > [4] > https://protect2.fireeye.com/url?k=495a140e-15d036e7-495a5495-0cc47ad93e2e-745cad547e47b7cc&u=https://opendev.org/openstack/nova/src/commit/5934c5dc6932fbf19ca7f3011c4ccc07b0038ac4/nova/objects/request_spec.py#L93-L100 > [5] > https://protect2.fireeye.com/url?k=733c10d0-2fb63239-733c504b-0cc47ad93e2e-25f07d70c4385f31&u=https://review.opendev.org/#/c/647396/ > From marcin.juszkiewicz at linaro.org Tue May 7 07:34:26 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Tue, 7 May 2019 09:34:26 +0200 Subject: [kolla][neutron][networking-infoblox] Python3 issue: "TypeError: Unicode-objects must be encoded before hashing" In-Reply-To: <1d56ad05-9fa4-16b7-5cbe-af5c339f58b1@linaro.org> References: <1d56ad05-9fa4-16b7-5cbe-af5c339f58b1@linaro.org> Message-ID: <42626a00-df14-3d9b-e52c-1dfc3eeb639f@linaro.org> W dniu 07.05.2019 o 08:42, Marcin Juszkiewicz pisze: > I am working on making Kolla images Python 3 only. So far images are py3 > but then there are issues during deployment phase which I do not know > how to solve. > > https://review.opendev.org/#/c/642375/ is a patch. > > 'kolla-ansible-ubuntu-source' CI job deploys using Ubuntu 18.04 based > images. And fails. > > Log [1] shows something which looks like 'works in py2, not tested with py3' > code: > > 1. http://logs.openstack.org/75/642375/19/check/kolla-ansible-ubuntu-source/40878ed/primary/logs/ansible/deploy > > > " File \"/var/lib/kolla/venv/lib/python3.6/site-packages/networking_infoblox/neutron/common/utils.py\", line 374, in get_hash", > " return hashlib.md5(str(time.time())).hexdigest()", > "TypeError: Unicode-objects must be encoded before hashing" > ], > > Any ideas which project goes wrong? And how/where to fix it? > Found something interesting. And no idea who to blame... We use http://tarballs.openstack.org/networking-infoblox/networking-infoblox-master.tar.gz during development. But master == 2.0.3dev97 So I checked on tarballs and on Pypi: newton = 9.0.1 ocata = 10.0.1 pike = 11.0.1 queens = 12.0.1 rocky = 13.0.0 (tarballs only) stein is not present Each of those releases were done from same code but changelog always says 2.0.2 -> current.release.0 -> current.release.update Can not it be versioned in sane way? 2.0.2 -> 9.0.0 -> 10.0.0 -> 11.0.0 -> 12.0.0 -> 13.0.0 -> 13.x.ydevz? From cjeanner at redhat.com Tue May 7 07:56:07 2019 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Tue, 7 May 2019 09:56:07 +0200 Subject: [TripleO][PTG] Validation summary Message-ID: Hello all, Last Saturday, we had a session about two topics: - Validation Framework - In-flight validations Here's a summary about the different discussions around those topics. ## Current state: - all the existing validations within "tripleo-validations" have been moved to the new format (proper ansible roles with dedicated playbook). Big thumb up to the involved people for the hard work! - Mistral runs them from the hard drive instead of using swift - Possibility to run validations through the CLI using the new "openstack tripleo validator" subcommand - Possibility to run the validations directly with ansible-playbook - Blog posts with demos and some explanations: ° https://cjeanner.github.io/openstack/tripleo/validations/2019/04/24/validation-framework.html ° https://cjeanner.github.io/openstack/tripleo/validations/2019/04/25/in-flight-validations.html ° https://cjeanner.github.io/openstack/tripleo/validations/2019/04/26/in-flight-validations-II.html ## TODO - Refactor tripleoclient code regarding the way ansible is called, in order to allow bypassing mistral (useful if mistral is broken or not available, like on a standalone deploy) - Get more validations from the Services teams (Compute, Neutron, and so on) - CI integration: get a job allowing to validate the framework (running the no-op validation and group) as well as validations themselves - Doc update (WIP: https://review.opendev.org/654943) - Check how the tripleo-validations content might be backported down to Pike or even Newton. We don't consider the CLI changes, since the cherry-picks will be more than painful, and might break things in a really bad way. You can find the whole, raw content on the following pad: https://etherpad.openstack.org/p/tripleo-ptg-train-validations In case you have questions or remarks, or want to dig further in the topics present on that pad, feel free to contact me or just run a thread on the ML :). Cheers, C. -- Cédric Jeanneret Software Engineer DFG:DF -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mark at stackhpc.com Tue May 7 09:01:47 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 10:01:47 +0100 Subject: [kolla] Denver summit summary Message-ID: Hi, Here are links to slides from the kolla project sessions at the summit. * Project update [1] * Project onboarding [2] There should be a video of the update available in due course. We also had a user feedback session, the Etherpad notes are at [3] Picking out some themes from the user feedback: * TLS everywhere * Nova Cells v2 * SELinux * podman & buildah support for CentOS/RHEL 8 I think we're in a good position to support the first two in the Train cycle since they have some work in progress. The latter two will require some investigation. [1] https://docs.google.com/presentation/d/1npG6NGGsJxdXFzmPLfrDsWMhxeDVY9-nBfmDBvrAAlQ/edit?usp=sharing [2] https://docs.google.com/presentation/d/11gGW93Xu7DQo_G1LiRDm6thfB5gNLm39SHuKcgSW8FQ/edit?usp=sharing [3] https://etherpad.openstack.org/p/DEN-train-kolla-feedback Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue May 7 09:02:57 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 10:02:57 +0100 Subject: [kayobe] Denver summit summary Message-ID: Hi, The Kayobe feedback & roadmap session Etherpad notes are at [1]. A major theme was documentation, including reference configurations and more around day 2 ops. On Tuesday evening we ran a packed workshop [2] on deploying OpenStack via Kayobe. It went pretty smoothly overall, and we had some positive feedback. Thanks to Packet for providing the infrastructure - the bare metal servers let us cover a lot of ground in a short time. Anyone wanting to try out the workshop can do so using a VM or bare metal server running CentOS 7 with at least 32GB RAM and 40GB disk. Follow the 'Full Deploy' section [3] in the README. I spoke with many people during the week who feel that Kayobe could be a great fit for them, which is really encouraging. Please look out for new users and contributors reaching out via IRC and the mailing list and help them get up to speed. [1] https://etherpad.openstack.org/p/DEN-19-kayobe-feedback-roadmap [2] https://www.openstack.org/summit/denver-2019/summit-schedule/events/23426/a-universe-from-nothing-containerised-openstack-deployment-using-kolla-ansible-and-kayobe [3] https://github.com/stackhpc/a-universe-from-nothing#full-deploy Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue May 7 09:07:28 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 10:07:28 +0100 Subject: [ptg][kolla][openstack-ansible][tripleo] PTG cross-project summary Message-ID: Hi, This is a summary of the ad-hoc cross project session between the OpenStack Ansible and Kolla teams. It occurred to me that our two teams likely face similar challenges, and there are areas we could collaborate on. I've tagged TripleO also since the same applies there. [Collaboration on approach to features] This was my main reason for proposing the session - there are features and changes that all deployment tools need to make. Examples coming up include support for upgrade checkers and IPv6. Rather than work in isolation and solve the same problem in different ways, perhaps we could share our experiences. The implementations will differ, but providing a reasonably consistent feel between deployment tools can't be a bad thing. As a test case, we briefly discussed our experience with the upgrade checker support added in Stein, and found that our expectation of how it would work was fairly aligned in the room, but not aligned with how I understand it to actually work (it's more of a post-upgrade check than a pre-upgrade check). I was also able to point the OSA team at the placement migration code added to Kolla in the Stein release, which should save them some time, and provide more eyes on our code. I'd like to pursue this more collaborative approach during the Train release where it fits. Upgrade checkers seems a good place to start, but am open to other ideas such as IPv6 or Python 3. [OSA in Kayobe] This was my wildcard - add support for deploying OpenStack via OSA in Kayobe as an alternative to Kolla Ansible. It could be a good fit for those users who want to use OSA but don't have a provisioning system. This wasn't true of anyone in the room, and lack of resources deters from 'build it and they will come'. Still, the seed is planted, it may yet grow. [Sharing Ansible roles] mnaser had an interesting idea: add support for deploying kolla containers to the OSA Ansible service roles. We could then use those roles within Kolla Ansible to avoid duplication of code. There is definitely some appeal to this in theory. In practice however I feel that the two deployment models are sufficiently different that it would add significantly complexity to both projects. Part of the (relative) simplicity and regularity of Kolla Ansible is enabled by handing off installation and other tasks to Kolla. One option that might work however is sharing some of the lower level building blocks. mnaser offered to make a PoC for using https://github.com/openstack/ansible-config_template to generate configuration in Kolla Ansible in place of merge_config and merge_yaml. It requires some changes to that role to support merging a list of source template files. We'd also need to add an external dependency to our 'monorepo', or 'vendor' the module - trade offs to make in complexity vs. maintaining our own module. I'd like to thank the OSA team for hosting the discussion - it was great to meet the team and share experience. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue May 7 09:25:06 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 10:25:06 +0100 Subject: [kolla] Denver summit summary In-Reply-To: References: Message-ID: On Tue, 7 May 2019 at 10:01, Mark Goddard wrote: > Hi, > > Here are links to slides from the kolla project sessions at the summit. > > * Project update [1] > * Project onboarding [2] > > There should be a video of the update available in due course. > > We also had a user feedback session, the Etherpad notes are at [3] > > Picking out some themes from the user feedback: > > * TLS everywhere > * Nova Cells v2 > * SELinux > * podman & buildah support for CentOS/RHEL 8 > > I think we're in a good position to support the first two in the Train > cycle since they have some work in progress. The latter two will require > some investigation. > > [1] > https://docs.google.com/presentation/d/1npG6NGGsJxdXFzmPLfrDsWMhxeDVY9-nBfmDBvrAAlQ/edit?usp=sharing > [2] > https://docs.google.com/presentation/d/11gGW93Xu7DQo_G1LiRDm6thfB5gNLm39SHuKcgSW8FQ/edit?usp=sharing > It was brought to my attention that Google slides might not be accessible from some places. I've uploaded to slideshare also, but it appears this is blocked in China. Is there another location where they can be hosted that is accessible from China? > [3] https://etherpad.openstack.org/p/DEN-train-kolla-feedback > > Cheers, > Mark > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From john at johngarbutt.com Tue May 7 09:27:21 2019 From: john at johngarbutt.com (John Garbutt) Date: Tue, 7 May 2019 10:27:21 +0100 Subject: [nova][ptg][keystone] Summary: Unified Limits and Policy Refresh in Nova Message-ID: Hi, A summary of the nova/keystone cross project PTG session. Full etherpad is here: https://etherpad.openstack.org/p/ptg-train-xproj-nova-keystone 1) Policy Refresh Spec: https://review.openstack.org/#/c/547850/ Notes: * Better defaults to make policy changes easier * Move from current to: System Admin vs Project Member * Also add System Reader and Project Reader ** Above requires more granular policy for some APIs ** Also change DB check: system or admin, eventually drop it * Lots of testing to avoid regressions * Patrole may be useful, but initial focus on in-tree tests Actions: * johnthetubaguy to update spec * melwitt, gmann and johnthetubaguy happy to work on these * upload POC for testing plan 2) Unified Limits Spec: https://review.opendev.org/#/c/602201/ Notes: * only move instances and resource class based quotas to keystone * work on tooling to help operators migrate to keystone based limits * adopt oslo.limit to enforce unified limits * eventually we get hierarchical limits and the "per flavor" use case Actions: * johnthetubaguy to update the spec * johnthetubaguy, melwitt, alex_xu happy to work on these things * work on POC to show approach Thanks, johnthetubaguy -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at stackhpc.com Tue May 7 11:52:18 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 12:52:18 +0100 Subject: [kolla] Virtual PTG scheduling poll In-Reply-To: References: Message-ID: The results are in! There were a few ties, so I picked the two sessions that most cores could attend and was most friendly to timezones of the attendees. Tues May 28th, 12:00 - 16:00 UTC Weds May 29th, 12:00 - 16:00 UTC Lets try to cover as much as possible in the first session, then decide if we need another. Unless anyone has any other suggestions, I propose we use Google hangouts for voice and/or video. Hangout: https://meet.google.com/pbo-boob-csh?hs=122 Calendar: https://calendar.google.com/event?action=TEMPLATE&tmeid=MGE1MHRuN2s2cTdkMm12YWtpMnY5YWZlNHRfMjAxOTA1MjhUMTIwMDAwWiBtYXJrQHN0YWNraHBjLmNvbQ&tmsrc=mark%40stackhpc.com&scp=ALL Cheers, Mark On Tue, 30 Apr 2019 at 19:01, Mark Goddard wrote: > Hi, > > We struggled to find a suitable date, so I've added another two weeks. > Please update your responses. > > https://doodle.com/poll/adk2smds76d8db4u > > Thanks, > Mark > > On Mon, 15 Apr 2019 at 07:34, Mark Goddard wrote: > >> Hi, >> >> Since most of the Kolla team aren't attending the PTG, we've agreed to >> hold a virtual PTG. >> >> We agreed to start with two 4 hour sessions. We can finish early or >> schedule another session, depending on how we progress. We'll use some >> video conferencing software TBD. >> >> I've created a Doodle poll here [2], please fill it in if you hope to >> attend. Times are in UTC. >> >> Please continue to fill out the planning Etherpad [1]. >> >> Thanks, >> Mark >> >> [1] https://etherpad.openstack.org/p/kolla-train-ptg >> [2] https://doodle.com/poll/adk2smds76d8db4u >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balazs.gibizer at ericsson.com Tue May 7 12:05:40 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Tue, 7 May 2019 12:05:40 +0000 Subject: [placement][nova][ptg] Summary: Nested Magic With Placement In-Reply-To: References: Message-ID: <1557230737.31620.1@smtp.office365.com> On Fri, May 3, 2019 at 8:22 PM, Chris Dent wrote: > > * A 'mappings' key will be added to the 'allocations' object in the > allocation_candidates response that will support request group > mapping. I refreshed the spec in the following way: 1) updated the spec in the nova-spec repo to capture the agreement [1] 2) copied the spec from the nova-spec repo to the placement repo [2] 3) uploaded the both spec updates [1][2] 4) abandoned the nova-spec [1] by pointing to the placement spec 5) marked the nova bp [3] in launchpad as superseded pointing to the placement story [4]. [1] https://review.opendev.org/#/c/597601/ [2] https://review.opendev.org/#/c/657582/ [3] https://blueprints.launchpad.net/nova/+spec/placement-resource-provider-request-group-mapping-in-allocation-candidates [4] https://storyboard.openstack.org/#!/story/2005575 Please note that I removed myself as 'Primary assignee' in the spec as this work has low prio in Train from my side so it is free for anybody to take over. I will try to help at least with the review. Cheers, gibi From hongbin034 at gmail.com Tue May 7 12:14:59 2019 From: hongbin034 at gmail.com (Hongbin Lu) Date: Tue, 7 May 2019 08:14:59 -0400 Subject: [devstack-plugin-container][zun][kuryr] Extend core team for devstack-plugin-container Message-ID: Hi all, I propose to add Zun and Kuryr core team into devstack-plugin-container. Right now, both Zun and Kuryr are using that plugin and extending the core team would help accelerating the code review process. Please let me know if there is any concern of the proposal. Best regards, Hongbin -------------- next part -------------- An HTML attachment was scrubbed... URL: From aspiers at suse.com Tue May 7 12:58:18 2019 From: aspiers at suse.com (Adam Spiers) Date: Tue, 7 May 2019 13:58:18 +0100 Subject: [tc][all][airship] Github mirroring (or lack thereof) for unofficial projects In-Reply-To: References: <20190503190538.GB3377@localhost.localdomain> <20190503230525.a3vxsnliklitnei4@arabian.linksys.moosehall> Message-ID: <20190507125818.ykue2rycwcrqjhms@pacific.linksys.moosehall> Roman Gorshunov wrote: >Thanks, Adam. > >I haven't been on PTG, sorry. It's good that there has been a >discussion and agreement is reached. Oh sorry, I assumed you must have been in the room when we discussed it, since your mail arrived just after then ;-) But it was just a coincidence! :-) From jaypipes at gmail.com Tue May 7 13:02:53 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Tue, 7 May 2019 09:02:53 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: References: Message-ID: On 05/06/2019 05:56 PM, Jean-Philippe Méthot wrote: > Hi, > > We’ve been modifying our login habits for Nova on our Openstack setup to > try to send only warning level and up logs to our log servers. To do so, > I’ve created a logging.conf and configured logging according to the > logging module documentation. While what I’ve done works, it seems to be > a very convoluted process for something as simple as changing the > logging level to warning. We worry that if we upgrade and the syntax for > this configuration file changes, we may have to push more changes > through ansible than we would like to. It's unlikely that the syntax for the logging configuration file will change since it's upstream Python, not OpenStack or Nova that is the source of this syntax. That said, if all you want to do is change some or all package default logging levels, you can change the value of the CONF.default_log_levels option. The default_log_levels CONF option is actually derived from the oslo_log package that is used by all OpenStack service projects. It's default value is here: https://github.com/openstack/oslo.log/blob/29671ef2bfacb416d397abc57170bb090b116f68/oslo_log/_options.py#L19-L31 So, if you don't want to mess with the standard Python logging conf, you can just change that CONF.default_log_levels option. Note that if you do specify a logging config file using a non-None CONF.log_config_append value, then all other logging configuration options (like default_log_levels) are ignored). Best, -jay From mark at stackhpc.com Tue May 7 13:23:12 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 14:23:12 +0100 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. In-Reply-To: References: Message-ID: On Fri, 3 May 2019 at 09:13, Ming-Che Liu wrote: > Apologies,this mail will attach rabbitmq log file(ues command "docker logs > --follow rabbitmq") for debug. > > Logs in /var/lib/docker/volumes/kolla_logs/_data/rabbitmq are empty. > > Hmm, there's not much to go on there. Are you now running Ubuntu 18.04? One thing that can help is running the container manually via docker run. It can take a little while to work out the right arguments to pass, but it's possible. Mark > thanks. > > Regards, > > Ming-Che > > Ming-Che Liu 於 2019年5月3日 週五 下午3:26寫道: > >> Hi Mark, >> >> I tried to deploy openstack+monasca with kolla-ansible 8.0.0.0rc1(in the >> same machine), but still encounter some fatal error. >> >> The attached file:golbals.yml is my setting, machine_package_setting is >> machine environment setting. >> >> The error is: >> RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] >> ************************************************************ >> fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec >> rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", "delta": >> "0:00:00.861054", "end": "2019-05-03 15:17:42.387873", "msg": "non-zero >> return code", "rc": 137, "start": "2019-05-03 15:17:41.526819", "stderr": >> "", "stderr_lines": [], "stdout": "", "stdout_lines": []} >> >> When I use command "docker inspect rabbitmq_id |grep RestartCount", I >> find rabbitmq will restart many times >> >> such as: >> >> kaga at agre-an21:~$ sudo docker inspect 5567f37cc78a |grep RestartCount >> "RestartCount": 15, >> >> Could please help to solve this problem? Thanks. >> >> Regards, >> >> Ming-Che >> >> >> >> >> >> >> >> Ming-Che Liu 於 2019年5月3日 週五 上午9:22寫道: >> >>> Hi Mark, >>> >>> Sure, I will do that, thanks. >>> >>> Regards, >>> >>> Ming-Che >>> >>> Mark Goddard 於 2019年5月3日 週五 上午1:12寫道: >>> >>>> >>>> >>>> On Wed, 1 May 2019 at 17:10, Ming-Che Liu wrote: >>>> >>>>> Hello, >>>>> >>>>> I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. >>>>> >>>>> I follow the steps as mentioned in >>>>> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >>>>> >>>>> The setting in my computer's globals.yml as same as [Quick Start] >>>>> tutorial (attached file: globals.yml is my setting). >>>>> >>>>> My machine environment as following: >>>>> OS: Ubuntu 16.04 >>>>> Kolla-ansible verions: 8.0.0.0rc1 >>>>> ansible version: 2.7 >>>>> >>>>> When I execute [bootstrap-servers] and [prechecks], it seems ok (no >>>>> fatal error or any interrupt). >>>>> >>>>> But when I execute [deploy], it will occur some error about >>>>> rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when I >>>>> set enable_rabbitmq:no). >>>>> >>>>> I have some detail screenshot about the errors as attached files, >>>>> could you please help me to solve this problem? >>>>> >>>>> Thank you very much. >>>>> >>>>> [Attached file description]: >>>>> globals.yml: my computer's setting about kolla-ansible >>>>> >>>>> As mentioned above, the following pictures show the errors, the >>>>> rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova compute >>>>> service error will occur if I set [enable_rabbitmq:no]. >>>>> >>>> >>>> Hi Ming-Che, >>>> >>>> Since Stein, we no longer test Kolla Ansible with Ubuntu 16.04 >>>> upstream. Could you try again using Ubuntu 18.04? >>>> >>>> Regards, >>>> Mark >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jim at jimrollenhagen.com Tue May 7 13:25:17 2019 From: jim at jimrollenhagen.com (Jim Rollenhagen) Date: Tue, 7 May 2019 09:25:17 -0400 Subject: [kolla] Denver summit summary In-Reply-To: References: Message-ID: On Tue, May 7, 2019 at 5:26 AM Mark Goddard wrote: > > > On Tue, 7 May 2019 at 10:01, Mark Goddard wrote: > >> Hi, >> >> Here are links to slides from the kolla project sessions at the summit. >> >> * Project update [1] >> * Project onboarding [2] >> >> There should be a video of the update available in due course. >> >> We also had a user feedback session, the Etherpad notes are at [3] >> >> Picking out some themes from the user feedback: >> >> * TLS everywhere >> * Nova Cells v2 >> * SELinux >> * podman & buildah support for CentOS/RHEL 8 >> >> I think we're in a good position to support the first two in the Train >> cycle since they have some work in progress. The latter two will require >> some investigation. >> >> [1] >> https://docs.google.com/presentation/d/1npG6NGGsJxdXFzmPLfrDsWMhxeDVY9-nBfmDBvrAAlQ/edit?usp=sharing >> [2] >> https://docs.google.com/presentation/d/11gGW93Xu7DQo_G1LiRDm6thfB5gNLm39SHuKcgSW8FQ/edit?usp=sharing >> > > It was brought to my attention that Google slides might not be accessible > from some places. I've uploaded to slideshare also, but it appears this is > blocked in China. Is there another location where they can be hosted that > is accessible from China? > Maybe upload as PDFs in a patch to kolla. No need to merge, but folks can checkout the patch to get the files. // jim -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue May 7 13:59:33 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 May 2019 13:59:33 +0000 Subject: [kolla][neutron][networking-infoblox] Python3 issue: "TypeError: Unicode-objects must be encoded before hashing" In-Reply-To: <42626a00-df14-3d9b-e52c-1dfc3eeb639f@linaro.org> References: <1d56ad05-9fa4-16b7-5cbe-af5c339f58b1@linaro.org> <42626a00-df14-3d9b-e52c-1dfc3eeb639f@linaro.org> Message-ID: <20190507135932.y4j24clfc43nj6cs@yuggoth.org> On 2019-05-07 09:34:26 +0200 (+0200), Marcin Juszkiewicz wrote: [...] > Found something interesting. And no idea who to blame... > > We use > http://tarballs.openstack.org/networking-infoblox/networking-infoblox-master.tar.gz > during development. > > But master == 2.0.3dev97 > > So I checked on tarballs and on Pypi: > > newton = 9.0.1 > ocata = 10.0.1 > pike = 11.0.1 > queens = 12.0.1 > rocky = 13.0.0 (tarballs only) > stein is not present > > Each of those releases were done from same code but changelog always > says 2.0.2 -> current.release.0 -> current.release.update > > > Can not it be versioned in sane way? > > 2.0.2 -> 9.0.0 -> 10.0.0 -> 11.0.0 -> 12.0.0 -> 13.0.0 -> 13.x.ydevz? The reason for this is that our present practice for service projects in OpenStack (which the x/networking-infoblox repository seems to partly follow) is to tag major releases after creating stable branches rather than before, and those tags therefore end up missing in the master branch history from which the master branch tarballs you're consuming are created. We used to have a process of merging the release tags back into the master branch history to solve this, but ceased a few years ago because it complicated calculating release notes across various branches. Instead official projects following this release model now receive an auto-proposed change to master as part of the cycle release process which sets a Sem-Ver commit message footer to increment the minor version past the rc1 tag (which is the stable branch point for them). Popular alternatives to this are either to tag an early prerelease on master soon after branching, or follow a different release process where branches are created when/after tagging rather than before (this is more typical of shared libraries in particular). One way in which x/networking-infoblox is not fully following the same release model as official services is that they don't seem to be tagging release candidates on master (or at all for that matter), which would partly mitigate this as you would instead see versions like 13.0.0.0rc2.dev3. Another way it's not fully following that model is, as you have observed, there's no stable/stein branch for it yet. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Tue May 7 14:02:57 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 7 May 2019 14:02:57 +0000 Subject: [kolla] Denver summit summary In-Reply-To: References: Message-ID: <20190507140257.7rmlio6he3gov6gn@yuggoth.org> On 2019-05-07 09:25:17 -0400 (-0400), Jim Rollenhagen wrote: > On Tue, May 7, 2019 at 5:26 AM Mark Goddard wrote: [...] > > It was brought to my attention that Google slides might not be > > accessible from some places. I've uploaded to slideshare also, > > but it appears this is blocked in China. Is there another > > location where they can be hosted that is accessible from China? > > Maybe upload as PDFs in a patch to kolla. No need to merge, but > folks can checkout the patch to get the files. If these are slides for a summit session, the event coordinators generally send a message out to all speakers shortly following the conference with instructions on how/where to upload their slide decks so they can be served alongside the session abstracts. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sean.mcginnis at gmx.com Tue May 7 14:20:47 2019 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 7 May 2019 09:20:47 -0500 Subject: [cinder][ops] Nested Quota Driver Use? In-Reply-To: References: <20190502003249.GA1432@sm-workstation> Message-ID: <20190507142046.GA3999@sm-workstation> On Fri, May 03, 2019 at 06:58:41PM +0000, Tim Bell wrote: > We're interested in the overall functionality but I think unified limits is the place to invest and thus would not have any problem deprecating this driver. > > We'd really welcome this being implemented across all the projects in a consistent way. The sort of functionality proposed in https://techblog.web.cern.ch/techblog/post/nested-quota-models/ would need Nova/Cinder/Manila at miniumum for CERN to switch. > > So, no objections to deprecation but strong support to converge on unified limits. > > Tim > Thanks Tim, that helps. Since there wasn't any other feedback, and no one jumping up to say they are using it today, I have submitted https://review.opendev.org/657511 to deprecated the current quota driver so we don't have to try to refactor that functionality into whatever we need to do for the unified limits support. If anyone has any concerns about this plan, please feel free to raise them here or on that review. Thanks! Sean From cjeanner at redhat.com Tue May 7 14:33:57 2019 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Tue, 7 May 2019 16:33:57 +0200 Subject: [TripleO][Validations] Tag convention Message-ID: <3c383d8d-54fa-b054-f0ad-b97ed67ba03f@redhat.com> Dear all, We're currently working hard in order to provide a nice way to run validations within a deploy (aka in-flight validations). We can already call validations provided by the tripleo-validations package[1], it's working just fine. Now comes the question: "how can we disable the validations?". In order to do that, we propose to use a standard tag in the ansible roles/playbooks, and to add a "--skip-tags " when we disable the validations via the CLI or configuration. After a quick check in the tripleoclient code, there apparently is a tag named "validation", that can already be skipped from within the client. So, our questions: - would the reuse of "validation" be OK? - if not, what tag would be best in order to avoid confusion? We also have the idea to allow to disable validations per service. For this, we propose to introduce the following tag: - validation-, like "validation-nova", "validation-neutron" and so on What do you think about those two additions? Thank you all for your feedbacks and idea! Cheers, C. [1] as shown here: https://cjeanner.github.io/openstack/tripleo/validations/2019/04/26/in-flight-validations-II.html -- Cédric Jeanneret Software Engineer - OpenStack Platform Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From mark at stackhpc.com Tue May 7 14:36:22 2019 From: mark at stackhpc.com (Mark Goddard) Date: Tue, 7 May 2019 15:36:22 +0100 Subject: [kolla] Denver summit summary In-Reply-To: <20190507140257.7rmlio6he3gov6gn@yuggoth.org> References: <20190507140257.7rmlio6he3gov6gn@yuggoth.org> Message-ID: On Tue, 7 May 2019 at 15:03, Jeremy Stanley wrote: > On 2019-05-07 09:25:17 -0400 (-0400), Jim Rollenhagen wrote: > > On Tue, May 7, 2019 at 5:26 AM Mark Goddard wrote: > [...] > > > It was brought to my attention that Google slides might not be > > > accessible from some places. I've uploaded to slideshare also, > > > but it appears this is blocked in China. Is there another > > > location where they can be hosted that is accessible from China? > > > > Maybe upload as PDFs in a patch to kolla. No need to merge, but > > folks can checkout the patch to get the files. > > If these are slides for a summit session, the event coordinators > generally send a message out to all speakers shortly following the > conference with instructions on how/where to upload their slide > decks so they can be served alongside the session abstracts. > Thanks, I'll wait for that. > -- > Jeremy Stanley > -------------- next part -------------- An HTML attachment was scrubbed... URL: From johnsomor at gmail.com Tue May 7 14:37:21 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 7 May 2019 07:37:21 -0700 Subject: [octavia] Error while creating amphora In-Reply-To: <0994c2fb-a2c1-89f8-10ca-c3d0d9bf79e2@gmx.com> References: <0994c2fb-a2c1-89f8-10ca-c3d0d9bf79e2@gmx.com> Message-ID: Yes, we have had discussions with the nova team about this. Their response was that the current config drive method we are using is a stable interface and will not go away. We also asked that the "user_data" method storage size be increased to a reasonable size that could be used for our current needs. Even growing that to an old floppy disk size would address our needs, but this was not committed to. Michael On Mon, May 6, 2019 at 8:54 AM Volodymyr Litovka wrote: > > Hi Michael, > > regarding file injection vs config_drive - > https://specs.openstack.org/openstack/nova-specs/specs/queens/implemented/deprecate-file-injection.html > - don't know when this will happen, but you see - people are thinking in > this way. > > On 5/2/19 5:58 PM, Michael Johnson wrote: > > Volodymyr, > > > > It looks like you have enabled "user_data_config_drive" in the > > octavia.conf file. Is there a reason you need this? If not, please > > set it to False and it will resolve your issue. > > > > It appears we have a python3 bug in the "user_data_config_drive" > > capability. It is not generally used and appears to be missing test > > coverage. > > > > I have opened a story (bug) on your behalf here: > > https://storyboard.openstack.org/#!/story/2005553 > > > > Michael > > > > On Thu, May 2, 2019 at 4:29 AM Volodymyr Litovka wrote: > >> Dear colleagues, > >> > >> I'm using Openstack Rocky and trying to launch Octavia 4.0.0. After all installation steps I've got an error during 'openstack loadbalancer create' with the following log: > >> > >> DEBUG octavia.controller.worker.tasks.compute_tasks [-] Compute create execute for amphora with id d037721f-2cf9-492e-99cb-0be5874da0f6 execute /opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py:63 > >> ERROR octavia.controller.worker.tasks.compute_tasks [-] Compute create for amphora id: d037721f-2cf9-492e-99cb-0be5874da0f6 failed: TypeError: can't concat str to bytes > >> ERROR octavia.controller.worker.tasks.compute_tasks Traceback (most recent call last): > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/controller/worker/tasks/compute_tasks.py", line 94, in execute > >> ERROR octavia.controller.worker.tasks.compute_tasks config_drive_files) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/user_data_jinja_cfg.py", line 38, in build_user_data_config > >> ERROR octavia.controller.worker.tasks.compute_tasks return self.agent_template.render(user_data=user_data) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/asyncsupport.py", line 76, in render > >> ERROR octavia.controller.worker.tasks.compute_tasks return original_render(self, *args, **kwargs) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 1008, in render > >> ERROR octavia.controller.worker.tasks.compute_tasks return self.environment.handle_exception(exc_info, True) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/environment.py", line 780, in handle_exception > >> ERROR octavia.controller.worker.tasks.compute_tasks reraise(exc_type, exc_value, tb) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/_compat.py", line 37, in reraise > >> ERROR octavia.controller.worker.tasks.compute_tasks raise value.with_traceback(tb) > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/octavia/common/jinja/templates/user_data_config_drive.template", line 29, in top-level template code > >> ERROR octavia.controller.worker.tasks.compute_tasks {{ value|indent(8) }} > >> ERROR octavia.controller.worker.tasks.compute_tasks File "/opt/openstack/lib/python3.6/site-packages/jinja2/filters.py", line 557, in do_indent > >> ERROR octavia.controller.worker.tasks.compute_tasks s += u'\n' # this quirk is necessary for splitlines method > >> ERROR octavia.controller.worker.tasks.compute_tasks TypeError: can't concat str to bytes > >> ERROR octavia.controller.worker.tasks.compute_tasks > >> WARNING octavia.controller.worker.controller_worker [-] Task 'STANDALONE-octavia-create-amp-for-lb-subflow-octavia-cert-compute-create' (06134192-def9-420c-9feb-0d08a068f3b2) transitioned into state 'FAILURE' from state 'RUNNING' > >> > >> Any advises where is the problem? > >> > >> My environment: > >> - Openstack Rocky > >> - Ubuntu 18.04 > >> - Octavia installed in virtualenv using pip install: > >> # pip list |grep octavia > >> octavia 4.0.0 > >> octavia-lib 1.1.1 > >> python-octaviaclient 1.8.0 > >> > >> Thank you. > >> > >> -- > >> Volodymyr Litovka > >> "Vision without Execution is Hallucination." -- Thomas Edison > >> > >> -- > >> Volodymyr Litovka > >> "Vision without Execution is Hallucination." -- Thomas Edison > > -- > Volodymyr Litovka > "Vision without Execution is Hallucination." -- Thomas Edison > From jp.methot at planethoster.info Tue May 7 15:15:47 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Tue, 7 May 2019 11:15:47 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: References: Message-ID: Hi, I’ve just tried setting everything to warn through the nova.conf option default_log_levels, as suggested. However, I’m still getting info level logs from the resource tracker like this : INFO nova.compute.resource_tracker Could the compute resource tracker logs be managed by another parameter than what’s in the default list for that configuration option? Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 7 mai 2019 à 09:02, Jay Pipes a écrit : > > On 05/06/2019 05:56 PM, Jean-Philippe Méthot wrote: >> Hi, >> We’ve been modifying our login habits for Nova on our Openstack setup to try to send only warning level and up logs to our log servers. To do so, I’ve created a logging.conf and configured logging according to the logging module documentation. While what I’ve done works, it seems to be a very convoluted process for something as simple as changing the logging level to warning. We worry that if we upgrade and the syntax for this configuration file changes, we may have to push more changes through ansible than we would like to. > > It's unlikely that the syntax for the logging configuration file will change since it's upstream Python, not OpenStack or Nova that is the source of this syntax. > > That said, if all you want to do is change some or all package default logging levels, you can change the value of the CONF.default_log_levels option. > > The default_log_levels CONF option is actually derived from the oslo_log package that is used by all OpenStack service projects. It's default value is here: > > https://github.com/openstack/oslo.log/blob/29671ef2bfacb416d397abc57170bb090b116f68/oslo_log/_options.py#L19-L31 > > So, if you don't want to mess with the standard Python logging conf, you can just change that CONF.default_log_levels option. Note that if you do specify a logging config file using a non-None CONF.log_config_append value, then all other logging configuration options (like default_log_levels) are ignored). > > Best, > -jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue May 7 15:18:14 2019 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 7 May 2019 11:18:14 -0400 Subject: [ops] ops meetups team meeting 2019-5-7 Message-ID: Minute from todays meeting are linked below. A vote was taken to officially confirm acceptance of Bloomberg's offer to host the second ops meetup of 2019 and passed. There is also some news of possible further meetups in 2020 and discussion of how to structure ops events at future Open Infra Summits. reminder: key ops events are notified via : https://twitter.com/osopsmeetup Now up to 63 followers! Meeting ended Tue May 7 15:00:30 2019 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) 11:00 AM O<•openstack> Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2019/ops_meetup_team.2019-05-07-14.04.html 11:00 AM Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2019/ops_meetup_team.2019-05-07-14.04.txt 11:00 AM Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2019/ops_meetup_team.2019-05-07-14.04.log.html Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmendiza at redhat.com Tue May 7 15:37:47 2019 From: dmendiza at redhat.com (=?UTF-8?Q?Douglas_Mendiz=c3=a1bal?=) Date: Tue, 7 May 2019 10:37:47 -0500 Subject: [nova][cinder][glance][Barbican]Finding Timeslot for weekly Image Encryption IRC meeting In-Reply-To: References: Message-ID: <6cdb30ba-888c-cd89-5bff-f432edb90467@redhat.com> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 Hi Josephine, I think it's a great idea to have a recurring meeting to keep track of the Image Encryption effort. I tried to answer your doodle, but it seems that it does not have actual times, just dates? Maybe we need a new doodle? I live in the CDT (UTC-5) Time Zone if that helps. Thanks, - - Douglas Mendizábal (redrobot) On 5/4/19 1:57 PM, Josephine Seifert wrote: > Hello, > > as a result from the Summit and the PTG, I would like to hold a > weekly IRC-meeting for the Image Encryption (soon to be a pop-up > team). > > As I work in Europe I have made a doodle poll, with timeslots I > can attend and hopefully many of you. If you would like to join in > a weekly meeting, please fill out the poll and state your name and > the project you are working in: > https://doodle.com/poll/wtg9ha3e5dvym6yt > > Thank you Josephine (Luzi) > > > -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEan2ddQosxMRNS/FejiZC4mXuYkoFAlzRpksACgkQjiZC4mXu Ykqfawf7BngccaTpWzDNIipc697bjA2eg8guEYvEJ4KKlgl0vC7duY5Jn/7B/cKp wCFLtTA9V00pdBsdF0ZPOIeRAMlLkcx2BX2H6KqY/NzX0jB2xCtVem4PkAQcig/y 7ika3q/1SdRLKkbxA/07TtY5Obh7T0WUeK0WoylEgKW4YWLnWmMsD6lgcLzgfG1Z 2oDcjyVYShX9A+MVk4saLU3Zt9EG81WY81Y6iOElcj1MQGDY8Ukgc7m4/ykho3Du fZmj3IvxnE134ZGUECTKklmXeOgUWCcnUucIkyTKoAa/uXzxdxfdLT8MHHPxaGFa 6KGECV916VjY0ck32KmzbnpamUbdgw== =MOwN -----END PGP SIGNATURE----- From morgan.fainberg at gmail.com Tue May 7 15:38:41 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Tue, 7 May 2019 08:38:41 -0700 Subject: [keystone] reminder no irc meeting today, may 7 Message-ID: This is a reminder that there will be no weekly irc Keystone meeting this week so that everyone can recover post Summit and PTG [1]. Meetings will resume normally next week on May 14th. [1] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005531.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Tue May 7 15:39:07 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Tue, 7 May 2019 11:39:07 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: References: Message-ID: As mentioned in my original response, if you have CONF.log_config_append set to anything, then the other conf options related to logging will be ignored. Best, -jay On Tue, May 7, 2019, 11:15 AM Jean-Philippe Méthot < jp.methot at planethoster.info> wrote: > Hi, > > I’ve just tried setting everything to warn through the nova.conf option > default_log_levels, as suggested. However, I’m still getting info level > logs from the resource tracker like this : > > INFO nova.compute.resource_tracker > > Could the compute resource tracker logs be managed by another parameter > than what’s in the default list for that configuration option? > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > > Le 7 mai 2019 à 09:02, Jay Pipes a écrit : > > On 05/06/2019 05:56 PM, Jean-Philippe Méthot wrote: > > Hi, > We’ve been modifying our login habits for Nova on our Openstack setup to > try to send only warning level and up logs to our log servers. To do so, > I’ve created a logging.conf and configured logging according to the logging > module documentation. While what I’ve done works, it seems to be a very > convoluted process for something as simple as changing the logging level to > warning. We worry that if we upgrade and the syntax for this configuration > file changes, we may have to push more changes through ansible than we > would like to. > > > It's unlikely that the syntax for the logging configuration file will > change since it's upstream Python, not OpenStack or Nova that is the source > of this syntax. > > That said, if all you want to do is change some or all package default > logging levels, you can change the value of the CONF.default_log_levels > option. > > The default_log_levels CONF option is actually derived from the oslo_log > package that is used by all OpenStack service projects. It's default value > is here: > > > https://github.com/openstack/oslo.log/blob/29671ef2bfacb416d397abc57170bb090b116f68/oslo_log/_options.py#L19-L31 > > So, if you don't want to mess with the standard Python logging conf, you > can just change that CONF.default_log_levels option. Note that if you do > specify a logging config file using a non-None CONF.log_config_append > value, then all other logging configuration options (like > default_log_levels) are ignored). > > Best, > -jay > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Tue May 7 16:08:56 2019 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 7 May 2019 18:08:56 +0200 Subject: [TripleO][Validations] Tag convention In-Reply-To: <3c383d8d-54fa-b054-f0ad-b97ed67ba03f@redhat.com> References: <3c383d8d-54fa-b054-f0ad-b97ed67ba03f@redhat.com> Message-ID: On Tue, May 7, 2019 at 4:44 PM Cédric Jeanneret wrote: > Dear all, > > We're currently working hard in order to provide a nice way to run > validations within a deploy (aka in-flight validations). > > We can already call validations provided by the tripleo-validations > package[1], it's working just fine. > > Now comes the question: "how can we disable the validations?". In order > to do that, we propose to use a standard tag in the ansible > roles/playbooks, and to add a "--skip-tags " when we disable the > validations via the CLI or configuration. > > After a quick check in the tripleoclient code, there apparently is a tag > named "validation", that can already be skipped from within the client. > > So, our questions: > - would the reuse of "validation" be OK? > - if not, what tag would be best in order to avoid confusion? > > We also have the idea to allow to disable validations per service. For > this, we propose to introduce the following tag: > - validation-, like "validation-nova", "validation-neutron" and > so on > > What do you think about those two additions? > Such as variables, I think we should prefix all our variables and tags with tripleo_ or something, to differentiate them from any other playbooks our operators could run. I would rather use "tripleo_validations" and "tripleo_validation_nova" maybe. Wdyt? -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From mnaser at vexxhost.com Tue May 7 16:24:42 2019 From: mnaser at vexxhost.com (Mohammed Naser) Date: Tue, 7 May 2019 12:24:42 -0400 Subject: [TripleO][Validations] Tag convention In-Reply-To: References: <3c383d8d-54fa-b054-f0ad-b97ed67ba03f@redhat.com> Message-ID: On Tue, May 7, 2019 at 12:12 PM Emilien Macchi wrote: > > > > On Tue, May 7, 2019 at 4:44 PM Cédric Jeanneret wrote: >> >> Dear all, >> >> We're currently working hard in order to provide a nice way to run >> validations within a deploy (aka in-flight validations). >> >> We can already call validations provided by the tripleo-validations >> package[1], it's working just fine. >> >> Now comes the question: "how can we disable the validations?". In order >> to do that, we propose to use a standard tag in the ansible >> roles/playbooks, and to add a "--skip-tags " when we disable the >> validations via the CLI or configuration. >> >> After a quick check in the tripleoclient code, there apparently is a tag >> named "validation", that can already be skipped from within the client. >> >> So, our questions: >> - would the reuse of "validation" be OK? >> - if not, what tag would be best in order to avoid confusion? >> >> We also have the idea to allow to disable validations per service. For >> this, we propose to introduce the following tag: >> - validation-, like "validation-nova", "validation-neutron" and >> so on >> >> What do you think about those two additions? > > > Such as variables, I think we should prefix all our variables and tags with tripleo_ or something, to differentiate them from any other playbooks our operators could run. > I would rather use "tripleo_validations" and "tripleo_validation_nova" maybe. Just chiming in here.. the pattern we like in OSA is using dashes for tags, I think having something like 'tripleo-validations' and 'tripleo-validations-nova' etc > Wdyt? > -- > Emilien Macchi -- Mohammed Naser — vexxhost ----------------------------------------------------- D. 514-316-8872 D. 800-910-1726 ext. 200 E. mnaser at vexxhost.com W. http://vexxhost.com From sundar.nadathur at intel.com Tue May 7 16:50:00 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Tue, 7 May 2019 16:50:00 +0000 Subject: [cyborg] No meetings this week Message-ID: <1CC272501B5BC543A05DB90AA509DED527557514@fmsmsx122.amr.corp.intel.com> Many of our developers are either jetlagged or have other conflicts, and prefer to reconvene later. Regards, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeremyfreudberg at gmail.com Tue May 7 17:22:44 2019 From: jeremyfreudberg at gmail.com (Jeremy Freudberg) Date: Tue, 7 May 2019 13:22:44 -0400 Subject: [sahara] Cancelling Sahara meeting May 9 Message-ID: Hi all, There will be no Sahara meeting this upcoming Thursday, May 9. Holler if you need anything. Thanks, Jeremy From alifshit at redhat.com Tue May 7 17:47:01 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Tue, 7 May 2019 13:47:01 -0400 Subject: [nova][CI] GPUs in the gate Message-ID: Hey all, Following up on the CI session during the PTG [1], I wanted to get the ball rolling on getting GPU hardware into the gate somehow. Initially the plan was to do it through OpenLab and by convincing NVIDIA to donate the cards, but after a conversation with Sean McGinnis it appears Infra have access to machines with GPUs. >From Nova's POV, the requirements are: * The machines with GPUs should probably be Ironic baremetal nodes and not VMs [*]. * The GPUs need to support virtualization. It's hard to get a comprehensive list of GPUs that do, but Nova's own docs [2] mention two: Intel cards with GVT [3] and NVIDIA GRID [4]. So I think at this point the question is whether Infra can support those reqs. If yes, we can start concrete steps towards getting those machines used by a CI job. If not, we'll fall back to OpenLab and try to get them hardware. [*] Could we do double-passthrough? Could the card be passed through to the L1 guest via the PCI passthrough mechanism, and then into the L2 guest via the mdev mechanism? [1] https://etherpad.openstack.org/p/nova-ptg-train-ci [2] https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html [3] https://01.org/igvt-g [4] https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf From dtantsur at redhat.com Tue May 7 17:47:57 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Tue, 7 May 2019 19:47:57 +0200 Subject: [ironic] My PTG & Forum notes Message-ID: <7313c6aa-1693-2cb0-4ed9-a73646764070@redhat.com> Hi folks, I've published my personal notes from the PTG & Forum in Denver: https://dtantsur.github.io/posts/ironic-denver-2019/ They're probably opinionated and definitely not complete, but I still think they could be useful. Also pasting the whole raw RST text below for ease of commenting. Cheers, Dmitry Keynotes ======== The `Metal3`_ project got some spotlight during the keynotes. A (successful!) `live demo`_ was done that demonstrated using Ironic through Kubernetes API to drive provisioning of bare metal nodes. The official `bare metal program`_ was announced to promote managing bare metal infrastructure via OpenStack. Forum: standalone Ironic ======================== On Monday we had two sessions dedicated to the future development of standalone Ironic (without Nova or without any other OpenStack services). During the `standalone roadmap session`_ the audience identified two potential domains where we could provide simple alternatives to depending on OpenStack services: * Alternative authentication. It was mentioned, however, that Keystone is a relatively easy service to install and operate, so adding this to Ironic may not be worth the effort. * Multi-tenant networking without Neutron. We could use networking-ansible_ directly, since they are planning on providing a Python API independent of their ML2 implementation. Next, firmware update support was a recurring topic (also in hallway conversations and also in non-standalone context). Related to that, a driver feature matrix documentation was requested, so that such driver-specific features are easier to discover. Then we had a separate `API multi-tenancy session`_. Three topic were covered: * Wiring in the existing ``owner`` field for access control. The idea is to allow operations for non-administrator users only to nodes with ``owner`` equal to their project (aka tenant) ID. In the non-keystone context this field would stay free-form. We did not agree whether we need an option to enable this feature. An interesting use case was mentioned: assign a non-admin user to Nova to allocate it only a part of the bare metal pool instead of all nodes. We did not reach a consensus on using a schema with the ``owner`` field, e.g. where ``keystone://{project ID}`` represents a Keystone project ID. * Adding a new field (e.g. ``deployed_by``) to track a user that requested deploy for auditing purposes. We agreed that the ``owner`` field should not be used for this purpose, and overall it should never be changed automatically by Ironic. * Adding some notion of *node leased to*, probably via a new field. This proposal was not well defined during the session, but we probably would allow some subset of API to lessees using the policy mechanism. It became apparent that implementing a separate *deployment API endpoint* is required to make such policy possible. Creating the deployment API was identified as a potential immediate action item. Wiring the ``owner`` field can also be done in the Train cycle, if we find volunteers to push it forward. PTG: scientific SIG =================== The PTG started for me with the `Scientific SIG discussions`_ of desired features and fixes in Ironic. The hottest topic was reducing the deployment time by reducing the number of reboots that are done during the provisioning process. `Ramdisk deploy`_ was identified as a very promising feature to solve this, as well as enable booting from remote volumes not supported directly by Ironic and/or Cinder. A few SIG members committed to testing it as soon as possible. Two related ideas were proposed for later brainstorming: * Keeping some proportion of nodes always on and with IPA booted. This is basing directly on the `fast-track deploy`_ work completed in the Stein cycle. A third party orchestrator would be needed for keeping the percentage, but Ironic will have to provide an API to boot an ``available`` node into the ramdisk. * Allow using *kexec* to instantly switch into a freshly deployed operating system. Combined together, these features can allow zero-reboot deployments. PTG: Ironic =========== Community sustainability ------------------------ We seem to have a disbalance in reviews, with very few people handling the majority of reviews, and some of them are close to burning out. * The first thing we discussed is simplifying the specs process. We considered a single +2 approval for specs and/or documentation. Approving documentation cannot break anyone, and follow-ups are easy, so it seems a good idea. We did not reach a firm agreement on a single +2 approval for specs; I personally feel that it would only move the bottleneck from specs to the code. * Facilitating deprecated feature removals can help clean up the code, and it can often be done by new contributors. We would like to maintain a list of what can be removed when, so that we don't forget it. * We would also like to switch to single +2 for stable backports. This needs changing the stable policy, and Tony volunteered to propose it. We felt that we're adding cores at a good pace, Julia had been mentoring people that wanted it. We would like people to volunteer, then we can mentor them into core status. However, we were not so sure we wanted to increase the stable core team. This team is supposed to be a small number of people that know quite a few small details of the stable policy (e.g. requirements changes). We thought we should better switch to single +2 approval for the existing team. Then we discussed moving away from WSME, which is barely maintained by a team of not really interested individuals. The proposal was to follow the example of Keystone and just move to Flask. We can use ironic-inspector as an example, and probably migrate part by part. JSON schema could replace WSME objects, similarly to how Nova does it. I volunteered to come up with a plan to switch, and some folks from Intel expressed interest in participating. Standalone roadmap ------------------ We started with a recap of items from `Forum: standalone Ironic`_. While discussing creating a driver matrix, we realized that we could keep driver capabilities in the source code (similar to existing iSCSI boot) and generate the documentation from it. Then we could go as far as exposing this information in the API. During the multi-tenancy discussion, the idea of owner and lessee fields was well received. Julia volunteered to write a specification for that. We clarified the following access control policies implemented by default: * A user can list or show nodes if they are an administrator, an owner of a node or a leaser of this node. * A user can deploy or undeploy a node (through the future deployment API) if they are an administrator, an owner of this node or a lessee of this node. * A user can update a node or any of its resources if they are an administrator or an owner of this node. A lessee of a node can **not** update it. The discussion of recording the user that did a deployment turned into discussing introducing a searchable log of changes to node power and provision states. We did not reach a final consensus on it, and we probably need a volunteer to push this effort forward. Deploy steps continued ---------------------- This session was dedicated to making the deploy templates framework more usable in practice. * We need to implement support for in-band deploy steps (other than the built-in ``deploy.deploy`` step). We probably need to start IPA before proceeding with the steps, similarly to how it is done with cleaning. * We agreed to proceed with splitting the built-in core step, making it a regular deploy step, as well as removing the compatibility shim for drivers that do not support deploy steps. We will probably separate writing an image to disk, writing a configdrive and creating a bootloader. The latter could be overridden to provide custom kernel parameters. * To handle potential differences between deploy steps in different hardware types, we discussed the possibility of optionally including a hardware type or interface name in a clean step. Such steps will only be run for nodes with matching hardware type or interface. Mark and Ruby volunteered to write a new spec on these topics. Day 2 operational workflow -------------------------- For deployments with external health monitoring, we need a way to represent the state when a deployed node looks healthy from our side but is detected as failed by the monitoring. It seems that we could introduce a new state transition from ``active`` to something like ``failed`` or ``quarantined``, where a node is still deployed, but explicitly marked as at fault by an operator. On unprovisioning, this node would not become ``available`` automatically. We also considered the possibility of using a flag instead of a new state, although the operators in the room were more in favor of using a state. We largely agreed that the already overloaded ``maintenance`` flag should not be used for this. On the Nova side we would probably use the ``error`` state to reflect nodes in the new state. A very similar request had been done for node retirement support. We decided to look for a unified solution. DHCP-less deploy ---------------- We discussed options to avoid relying on DHCP for deploying. * An existing specification proposes attaching IP information to virtual media. The initial contributors had become inactive, so we decided to help this work to go through. Volunteers are welcome. * As an alternative to that, we discussed using IPv6 SLAAC with multicast DNS (routed across WAN for Edge cases). A couple of folks on the room volunteered to help with testing. We need to fix python-zeroconf_ to support IPv6, which is something I'm planning on. Nova room --------- In a cross-project discussion with the Nova team we went through a few topics: * Whether Nova should use new Ironic API to build config drives. Since Ironic is not the only driver building config drives, we agreed that it probably doesn't make much sense to change that. * We did not come to a conclusion on deprecating capabilities. We agreed that Ironic has to provide alternatives for ``boot_option`` and ``boot_mode`` capabilities first. These will probably become deploy steps or built-in traits. * We agreed that we should switch Nova to using *openstacksdk* instead of *ironicclient* to access Ironic. This work had already been in progress. Faster deploy ------------- We followed up to `PTG: scientific SIG`_ with potential action items on speeding up the deployment process by reducing the number of reboots. We discussed an ability to keep all or some nodes powered on and heartbeating in the ``available`` state: * Add an option to keep the ramdisk running after cleaning. * For this to work with multi-tenant networking we'll need an IPA command to reset networking. * Add a provisioning verb going from ``available`` to ``available`` booting the node into IPA. * Make sure that pre-booted nodes are prioritized for scheduling. We will probably dynamically add a special trait. Then we'll have to update both Nova/Placement and the allocation API to support preferred (optional) traits. We also agreed that we could provide an option to *kexec* instead of rebooting as an advanced deploy step for operators that really know their hardware. Multi-tenant networking can be tricky in this case, since there is no safe point to switch from deployment to tenant network. We will probably take a best effort approach: command IPA to shutdown all its functionality and schedule a *kexec* after some time. After that, switch to tenant networks. This is not entirely secure, but will probably fit the operators (HPC) who requests it. Asynchronous clean steps ------------------------ We discussed enhancements for asynchronous clean and deploy steps. Currently running a step asynchronously requires either polling in a loop (occupying a green thread) or creating a new periodic task in a hardware type. We came up with two proposed updates for clean steps: * Allow a clean step to request re-running itself after certain amount of time. E.g. a clean step would do something like .. code-block:: python @clean_step(...) def wait_for_raid(self): if not raid_is_ready(): return RerunAfter(60) and the conductor would schedule re-running the same step in 60 seconds. * Allow a clean step to spawn more clean steps. E.g. a clean step would do something like .. code-block:: python @clean_step(...) def create_raid_configuration(self): start_create_raid() return RunNext([{'step': 'wait_for_raid'}]) and the conductor would insert the provided step to ``node.clean_steps`` after the current one and start running it. This would allow for several follow-up steps as well. A use case is a clean step for resetting iDRAC to a clean state that in turn consists of several other clean steps. The idea of sub-steps was deemed too complicated. PTG: TripleO ============ We discussed our plans for removing Nova from the TripleO undercloud and moving bare metal provisioning from under control of Heat. The plan from the `nova-less-deploy specification`_, as well as the current state of the implementation, were presented. The current concerns are: * upgrades from a Nova based deployment (probably just wipe the Nova database), * losing user experience of ``nova list`` (largely compensated by ``metalsmith list``), * tracking IP addresses for networks other than *ctlplane* (solved the same way as for deployed servers). The next action item is to create a CI job based on the already merged code and verify a few assumptions made above. PTG: Ironic, Placement, Blazar ============================== We reiterated over our plans to allow Ironic to optionally report nodes to Placement. This will be turned off when Nova is present to avoid conflicts with the Nova reporting. We will optionally use Placement as a backend for Ironic allocation API (which is something that had been planned before). Then we discussed potentially exposing detailed bare metal inventory to Placement. To avoid partial allocations, Placement could introduce new API to consume the whole resource provider. Ironic would use it when creating an allocation. No specific commitments were made with regards to this idea. Finally we came with the following workflow for bare metal reservations in Blazar: #. A user requests a bare metal reservation from Blazar. #. Blazar fetches allocation candidates from Placement. #. Blazar fetches a list of bare metal nodes from Ironic and filters out allocation candidates, whose resource provider UUID does not match one of the node UUIDs. #. Blazar remembers the node UUID and returns the reservation UUID to the user. When the reservation time comes: #. Blazar creates an allocation in Ironic (not Placement) with the candidate node matching previously picked node and allocation UUID matching the reservation UUID. #. When the enhancements in `Standalone roadmap`_ are implemented, Blazar will also set the node's lessee field to the user ID of the reservation, so that Ironic allows access to this node. #. A user fetches an Ironic allocation corresponding to the Blazar reservation UUID and learns the node UUID from it. #. A user proceeds with deploying the node. Side and hallway discussions ============================ * We discussed having Heat resources for Ironic. We recommended the team to start with Allocation and Deployment resources (the latter being virtual until we implement the planned deployment API). * We prototyped how Heat resources for Ironic could look, including Node, Port, Allocation and Deployment as a first step. .. _Metal3: http://metal3.io .. _live demo: https://www.openstack.org/videos/summits/denver-2019/openstack-ironic-and-bare-metal-infrastructure-all-abstractions-start-somewhere .. _bare metal program: https://www.openstack.org/bare-metal/ .. _standalone roadmap session: https://etherpad.openstack.org/p/DEN-train-next-steps-for-standalone-ironic .. _networking-ansible: https://opendev.org/x/networking-ansible .. _API multi-tenancy session: https://etherpad.openstack.org/p/DEN-train-ironic-multi-tenancy .. _Scientific SIG discussions: https://etherpad.openstack.org/p/scientific-sig-ptg-train .. _Ramdisk deploy: https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html#ramdisk-deploy .. _fast-track deploy: https://storyboard.openstack.org/#!/story/2004965 .. _python-zeroconf: https://github.com/jstasiak/python-zeroconf .. _nova-less-deploy specification: http://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html From aspiers at suse.com Tue May 7 18:16:14 2019 From: aspiers at suse.com (Adam Spiers) Date: Tue, 7 May 2019 19:16:14 +0100 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: <20190507181614.2s3qb3gopzvryt7o@pacific.linksys.moosehall> Morgan Fainberg wrote: >On Sat, May 4, 2019, 16:48 Eric Fried wrote: >> (NB: I tagged [all] because it would be interesting to know where other >> teams stand on this issue.) >> >> Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance I didn't pipe up during the PTG discussion because a) I missed the first 5-10 minutes and hence probably some important context, and b) I've not been a nova contributor long enough to be well-informed on this topic. Apologies if that was the wrong decision. But I do have a few thoughts on this, which I'll share below. Given b), take them with a pinch of salt ;-) Firstly, I was impressed with the way this topic was raised and discussed, and I think that is a very encouraging indicator for the current health of nova contributor culture. We're in a good place :-) >> Summary: >> - There is a (currently unwritten? at least for Nova) rule that a patch >> should not be approved exclusively by cores from the same company. This >> is rife with nuance, including but not limited to: >> - Usually (but not always) relevant when the patch was proposed by >> member of same company >> - N/A for trivial things like typo fixes >> - The issue is: >> - Should the rule be abolished? and/or >> - Should the rule be written down? >> >> Consensus (not unanimous): [snipped] >Keystone used to have the same policy outlined in this email (with much of >the same nuance and exceptions). Without going into crazy details (as the >contributor and core numbers went down), we opted to really lean on "Overall, >we should be able to trust cores to act in good faith". We abolished the >rule and the cores always ask for outside input when the familiarity lies >outside of the team. We often also pull in cores more familiar with the >code sometimes ending up with 3x+2s before we workflow the patch. > >Personally I don't like the "this is an >unwritten rule and it shouldn't be documented"; if documenting and >enforcement of the rule elicits worry of gaming the system or being a dense >some not read, in my mind (and experience) the rule may not be worth >having. I voice my opinion with the caveat that every team is different. If >the rule works, and helps the team (Nova in this case) feel more confident >in the management of code, the rule has a place to live on. What works for >one team doesn't always work for another. +1 - I'm not wildly enthusiastic about the "keep it undocumented" approach either. Here's my stab at handling some of the objections to a written policy. >> - The rule should not be documented (this email notwithstanding). This >> would either encourage loopholing I don't see why the presence of a written rule would encourage people to deliberately subvert upstream trust any more than they might otherwise do. And a rule with loopholes is still a better deterrent than no rule at all. This is somewhat true for deliberate subversions of trust (which I expect are non-existent or at least extremely rare), but especially true for accidental subversions of trust which could otherwise happen quite easily due to not fully understanding how upstream works. >> or turn into a huge detailed legal tome that nobody will read. I don't think it has to. It's not a legal document, so there's no need to attempt to make it like one. If there are loopholes which can't easily be covered by a simple rewording, then so be it. If the policy only catches 50% of cases, it's still helping. So IMHO the existence of loopholes doesn't justify throwing the baby out with the bathwater. >> It would also *require* enforcement, which >> is difficult and awkward. Overall, we should be able to trust cores to >> act in good faith and in the appropriate spirit. I agree that enforcement would be difficult and awkward, and that we should be able to trust cores. But in the unlikely and unfortunate situation that a problem arose in this space, the lack of a written policy wouldn't magically solve that problem. in fact it would make it even *harder* to deal with, because there'd be nothing to point to in order to help explain to the offender what they were doing wrong. That would automatically make any judgement appear more subjective than objective, and therefore more prone to being taken personally. From pawel.konczalski at everyware.ch Tue May 7 19:10:54 2019 From: pawel.konczalski at everyware.ch (Pawel Konczalski) Date: Tue, 7 May 2019 21:10:54 +0200 Subject: Magnum Kubernetes openstack-cloud-controller-manager unable not resolve master node by DNS Message-ID: Hi, i try to deploy a Kubernetes cluster with OpenStack Magnum but the openstack-cloud-controller-manager pod fails to resolve the master node hostname. Does magnum require further parameter to configure the DNS names of the master and minions? DNS resolution in the VMs works fine. Currently there is no Designate installed in the OpenStack setup. openstack coe cluster template create kubernetes-cluster-template1 \   --image Fedora-AtomicHost-29-20190429.0.x86_64 \   --external-network public \   --dns-nameserver 8.8.8.8 \   --master-flavor m1.kubernetes \   --flavor m1.kubernetes \   --coe kubernetes \   --volume-driver cinder \   --network-driver flannel \   --docker-volume-size 25 openstack coe cluster create kubernetes-cluster1 \   --cluster-template kubernetes-cluster-template1 \   --master-count 1 \   --node-count 2 \   --keypair mykey # kubectl get pods --all-namespaces -o wide NAMESPACE     NAME                                       READY STATUS             RESTARTS   AGE       IP NODE                                         NOMINATED NODE kube-system   coredns-78df4bf8ff-mjp2c                   0/1 Pending            0          36m                                              kube-system   heapster-74f98f6489-tgtzl                  0/1 Pending            0          36m                                              kube-system   kube-dns-autoscaler-986c49747-wrvz4        0/1 Pending            0          36m                                              kube-system   kubernetes-dashboard-54cb7b5997-sk5pj      0/1 Pending            0          36m                                              kube-system   openstack-cloud-controller-manager-dgk64   0/1 CrashLoopBackOff   11         36m       kubernetes-cluster1-vulg5fz6hg2n-master-0   # kubectl -n kube-system logs openstack-cloud-controller-manager-dgk64 Error from server: Get https://kubernetes-cluster1-vulg5fz6hg2n-master-0:10250/containerLogs/kube-system/openstack-cloud-controller-manager-dgk64/openstack-cloud-controller-manager: dial tcp: lookup kubernetes-cluster1-vulg5fz6hg2n-master-0 on 8.8.8.8:53: no such host BR Pawel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: not available URL: From jungleboyj at gmail.com Tue May 7 20:06:10 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Tue, 7 May 2019 15:06:10 -0500 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: All, Cinder has been working with the same unwritten rules for quite some time as well with minimal issues. I think the concerns about not having it documented are warranted.  We have had question about it in the past with no documentation to point to.  It is more or less lore that has been passed down over the releases.  :-) At a minimum, having this e-mail thread is helpful.  If, however, we decide to document it I think we should have it consistent across the teams that use the rule.  I would be happy to help draft/review any such documentation. Jay On 5/4/2019 8:19 PM, Morgan Fainberg wrote: > > > On Sat, May 4, 2019, 16:48 Eric Fried wrote: > > (NB: I tagged [all] because it would be interesting to know where > other > teams stand on this issue.) > > Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance > > Summary: > - There is a (currently unwritten? at least for Nova) rule that a > patch > should not be approved exclusively by cores from the same company. > This > is rife with nuance, including but not limited to: >   - Usually (but not always) relevant when the patch was proposed by > member of same company >   - N/A for trivial things like typo fixes > - The issue is: >   - Should the rule be abolished? and/or >   - Should the rule be written down? > > Consensus (not unanimous): > - The rule should not be abolished. There are cases where both the > impetus and the subject matter expertise for a patch all reside within > one company. In such cases, at least one core from another company > should still be engaged and provide a "procedural +2" - much like > cores > proxy SME +1s when there's no core with deep expertise. > - If there is reasonable justification for bending the rules (e.g. > typo > fixes as noted above, some piece of work clearly not related to the > company's interest, unwedging the gate, etc.) said justification > should > be clearly documented in review commentary. > - The rule should not be documented (this email notwithstanding). This > would either encourage loopholing or turn into a huge detailed legal > tome that nobody will read. It would also *require* enforcement, which > is difficult and awkward. Overall, we should be able to trust cores to > act in good faith and in the appropriate spirit. > > efried > . > > > Keystone used to have the same policy outlined in this email (with > much of the same nuance and exceptions). Without going into crazy > details (as the contributor and core numbers went down), we opted to > really lean on "Overall, we should be able to trust cores to act in > good faith". We abolished the rule and the cores always ask for > outside input when the familiarity lies outside of the team. We often > also pull in cores more familiar with the code sometimes ending up > with 3x+2s before we workflow the patch. > > Personally I don't like the "this is an > unwritten rule and it shouldn't be documented"; if documenting and > enforcement of the rule elicits worry of gaming the system or being a > dense some not read, in my mind (and experience) the rule may not be > worth having. I voice my opinion with the caveat that every team is > different. If the rule works, and helps the team (Nova in this case) > feel more confident in the management of code, the rule has a place to > live on. What works for one team doesn't always work for another. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Tue May 7 20:22:25 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Tue, 7 May 2019 15:22:25 -0500 Subject: [cinder][ops] Nested Quota Driver Use? In-Reply-To: <20190507142046.GA3999@sm-workstation> References: <20190502003249.GA1432@sm-workstation> <20190507142046.GA3999@sm-workstation> Message-ID: On 5/7/2019 9:20 AM, Sean McGinnis wrote: > On Fri, May 03, 2019 at 06:58:41PM +0000, Tim Bell wrote: >> We're interested in the overall functionality but I think unified limits is the place to invest and thus would not have any problem deprecating this driver. >> >> We'd really welcome this being implemented across all the projects in a consistent way. The sort of functionality proposed in https://techblog.web.cern.ch/techblog/post/nested-quota-models/ would need Nova/Cinder/Manila at miniumum for CERN to switch. >> >> So, no objections to deprecation but strong support to converge on unified limits. >> >> Tim >> > Thanks Tim, that helps. > > Since there wasn't any other feedback, and no one jumping up to say they are > using it today, I have submitted https://review.opendev.org/657511 to > deprecated the current quota driver so we don't have to try to refactor that > functionality into whatever we need to do for the unified limits support. > > If anyone has any concerns about this plan, please feel free to raise them here > or on that review. > > Thanks! > Sean Sean, If I remember correctly, IBM had put some time into trying to fix the nested quota driver back around the Kilo or Liberty release. I haven't seen much activity since then. I am in support deprecating the driver and going to unified limits given that that appears to be the general direction of OpenStack. Jay From mthode at mthode.org Tue May 7 20:30:22 2019 From: mthode at mthode.org (Matthew Thode) Date: Tue, 7 May 2019 15:30:22 -0500 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 Message-ID: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> Hi all, This is a warning and call to test the requests updates linked below. The best way to test is to make a dummy review in your project that depends on the linked review (either Pike or Queens). Upstream has no intrest or (easy) ability to backport the patch. Please let us know either in the the #openstack-requirements channel or in this email thread if you have issues. Pike - 2.18.2 -> 2.20.1 - https://review.opendev.org/640727 Queens - 2.18.4 -> 2.20.1 - https://review.opendev.org/640710 -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From rodrigodsousa at gmail.com Tue May 7 20:30:51 2019 From: rodrigodsousa at gmail.com (Rodrigo Duarte) Date: Tue, 7 May 2019 13:30:51 -0700 Subject: [dev][keystone][ptg] Keystone team action items In-Reply-To: References: Message-ID: Thanks for the summary, Colleen. On Sun, May 5, 2019 at 8:59 AM Colleen Murphy wrote: > Hi everyone, > > I will write an in-depth summary of the Forum and PTG some time in the > coming week, but I wanted to quickly capture all the action items that came > out of the last six days so that we don't lose too much focus: > > Colleen > * move "Expand endpoint filters to Service Providers" spec[1] to attic > * review "Policy Goals"[2] and "Policy Security Roadmap"[3] specs with > Lance, refresh and possibly combine them > * move "Unified model for assignments, OAuth, and trusts" spec[4] from > ongoing to backlog, and circle up with Adam about refreshing it > * update app creds spec[5] to defer access_rules_config > * review app cred documentation with regard to proactive rotation > * follow up with nova/other service teams on need for microversion support > in access rules > * circle up with Guang on fixing autoprovisioning for tokenless auth > * keep up to date with IEEE/NIST efforts on standardizing federation > * investigate undoing the foreign key constraint that breaks the pluggable > resource driver > * propose governance change to add caching as a base service > * clean out deprecated cruft from keystonemiddleware > * write up Outreachy/other internship application tasks > > [1] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/service-providers-filters.html > [2] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/policy-goals.html > [3] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/policy-security-roadmap.html > [4] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/unified-delegation.html > [5] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/train/capabilities-app-creds.html > > Lance > * write up plan for tempest testing of system scope > * break up unified limits testing plan into separate items, one for CRUD > in keystone and one for quota and limit validation in oslo.limit[6] > * write up spec for assigning roles on root domain > * (with Morgan) check for and add interface in oslo.policy to see if > policy has been overridden > > [6] https://trello.com/c/kbKvhYBz/20-test-unified-limits-in-tempest > > Kristi > * finish mutable config patch > * propose "model-timestamps" spec for Train[7] > * move "Add Multi-Version Support to Federation Mappings" spec[8] to attic > * review and possibly complete "Devstack Plugin for Keystone" spec[9] > * look into "RFE: Improved OpenID Connect Support" spec[10] > * update refreshable app creds spec[11] to make federated users expire > rather then app creds > * deprecate federated_domain_name > > [7] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/model-timestamps.html > [8] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/versioned-mappings.html > [9] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/devstack-plugin.html > [10] https://bugs.launchpad.net/keystone/+bug/1815971 > [11] https://review.opendev.org/604201 > > Vishakha > * investigate effort needed for Alembic migrations spec[12] (with help > from Morgan) > * merge "RFE: Retrofit keystone-manage db_* commands to work with > Alembic"[13] into "Use Alembic for database migrations" spec > * remove deprecated [signing] config > * remove deprecated [DEFAULT]/admin_endpoint config > * remove deprecated [token]/infer_roles config > > [12] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/alembic.html > [13] https://bugs.launchpad.net/keystone/+bug/1816158 > > Morgan > * review "Materialize Project Hierarchy" spec[14] and make sure it > reflects the current state of the world, keep it in the backlog > * move "Functional Testing" spec[15] to attic > * move "Object Dependency Lifecycle" spec[16] to complete > * move "Add Endpoint Filter Enforcement to Keystonemiddleware" spec[17] to > attic > * move "Request Helpers" spec[18] to attic > * create PoC of external IdP proxy component > * (with Lance) check for and add interface in oslo.policy to see if policy > has been overridden > * investigate removing [eventlet_server] config section > * remove remaining PasteDeploy things > * remove PKI(Z) cruft from keystonemiddleware > * refactor keystonemiddleware to have functional components instead of > needing keystone to instantiate keystonemiddleware objects for auth > > [14] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/backlog/materialize-project-hierarchy.html > [15] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/functional-testing.html > [16] > http://specs.openstack.org/openstack/keystone-specs/specs/keystone/ongoing/object-dependency-lifecycle.html > [17] > http://specs.openstack.org/openstack/keystone-specs/specs/keystonemiddleware/backlog/endpoint-enforcement-middleware.html > [18] > http://specs.openstack.org/openstack/keystone-specs/specs/keystonemiddleware/backlog/request-helpers.html > > Gage > * investigate with operators about specific use case behind "RFE: > Whitelisting (opt-in) users/projects/domains for PCI compliance"[19] request > * follow up on "RFE: Token returns Project's tag properties"[20] > * remove use of keystoneclient from keystonemiddleware > > [19] https://bugs.launchpad.net/keystone/+bug/1637146 > [20] https://bugs.launchpad.net/keystone/+bug/1807697 > > Rodrigo > * Propose finishing "RFE: Project Tree Deletion/Disabling"[21] as an > Outreachy project > > [21] https://bugs.launchpad.net/keystone/+bug/1816105 > > Adam > * write up super-spec on explicit project IDs plus predictable IDs > > > Thanks everyone for a productive week and for all your hard work! > > Colleen > > -- Rodrigo http://rodrigods.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mthode at mthode.org Tue May 7 20:35:36 2019 From: mthode at mthode.org (Matthew Thode) Date: Tue, 7 May 2019 15:35:36 -0500 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> Message-ID: <20190507203536.w7uf2kh6qpvkhcgy@mthode.org> On 19-05-07 15:30:22, Matthew Thode wrote: > Hi all, > > This is a warning and call to test the requests updates linked below. > The best way to test is to make a dummy review in your project that > depends on the linked review (either Pike or Queens). Upstream has no > intrest or (easy) ability to backport the patch. > > Please let us know either in the the #openstack-requirements channel or > in this email thread if you have issues. > > Pike - 2.18.2 -> 2.20.1 - https://review.opendev.org/640727 > Queens - 2.18.4 -> 2.20.1 - https://review.opendev.org/640710 > Forgot to set the timeline for merging those reviews, the current plan is to merge them Tuesday Morning (May 14th) either EU or US time. -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From johnsomor at gmail.com Tue May 7 20:39:20 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 7 May 2019 13:39:20 -0700 Subject: OpenStack User Survey 2019 In-Reply-To: <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> References: <5CC0732E.8020601@tipit.net> <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> Message-ID: Jimmy & Allison, As you probably remember from previous year's surveys, the Octavia team has been trying to get a question included in the survey for a while. I have included the response we got the last time we inquired about the survey below. We never received a follow up invitation. I think it would be in the best interest for the community if we follow our "Four Opens" ethos in the user survey process, specifically the "Open Community" statement, by soliciting survey questions from the project teams in an open forum such as the openstack-discuss mailing list. Michael ----- Last response e-mail ------ Jimmy McArthur Fri, Sep 7, 2018, 5:51 PM to Allison, me Hey Michael, The project-specific questions were added in 2017, so likely didn't include some new projects. While we asked all projects to participate initially, less than a dozen did. We will be sending an invitation for new/underrepresented projects in the coming weeks. Please stand by and know that we value your feedback and that of the community. Cheers! On Sat, Apr 27, 2019 at 5:11 PM Allison Price wrote: > > Hi Michael, > > We reached out to all of the PTLs who had questions in the 2018 version of the survey to review and update their questions. If there is a project that was missed, we can add it and share anonymized results with the PTLs directly as well as the openstack-discsuss mailing list. > > If there is a question from the Octavia team, please let us know and we can add it for the 2019 survey. > > Cheers, > Allison > > > > On Apr 27, 2019, at 4:01 PM, Michael Johnson wrote: > > Jimmy, > > I am curious, how did you reach out the PTLs for project specific > questions? The Octavia team didn't receive any e-mail from you or > Allison on the topic. > > Michael > > From allison at openstack.org Tue May 7 20:50:10 2019 From: allison at openstack.org (Allison Price) Date: Tue, 7 May 2019 15:50:10 -0500 Subject: OpenStack User Survey 2019 In-Reply-To: References: <5CC0732E.8020601@tipit.net> <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> Message-ID: Hi Michael, I apologize that the Octavia project team has been unable to submit a question to date. Jimmy posted the User Survey update to the public mailing list to ensure we updated the entire community and that we caught any projects that had not submitted their questions. The User Survey is open all year, and the primary goal is passing operator feedback to the upstream community. If the Octavia team - or any OpenStack project team - has a question they would like added (limit of 2 per project), please let Jimmy or myself know. Thanks for reaching out, Michael. Cheers, Allison > On May 7, 2019, at 3:39 PM, Michael Johnson wrote: > > Jimmy & Allison, > > As you probably remember from previous year's surveys, the Octavia > team has been trying to get a question included in the survey for a > while. > I have included the response we got the last time we inquired about > the survey below. We never received a follow up invitation. > > I think it would be in the best interest for the community if we > follow our "Four Opens" ethos in the user survey process, specifically > the "Open Community" statement, by soliciting survey questions from > the project teams in an open forum such as the openstack-discuss > mailing list. > > Michael > > ----- Last response e-mail ------ > Jimmy McArthur > > Fri, Sep 7, 2018, 5:51 PM > to Allison, me > Hey Michael, > > The project-specific questions were added in 2017, so likely didn't > include some new projects. While we asked all projects to participate > initially, less than a dozen did. We will be sending an invitation for > new/underrepresented projects in the coming weeks. Please stand by and > know that we value your feedback and that of the community. > > Cheers! > > > >> On Sat, Apr 27, 2019 at 5:11 PM Allison Price wrote: >> >> Hi Michael, >> >> We reached out to all of the PTLs who had questions in the 2018 version of the survey to review and update their questions. If there is a project that was missed, we can add it and share anonymized results with the PTLs directly as well as the openstack-discsuss mailing list. >> >> If there is a question from the Octavia team, please let us know and we can add it for the 2019 survey. >> >> Cheers, >> Allison >> >> >> >> On Apr 27, 2019, at 4:01 PM, Michael Johnson wrote: >> >> Jimmy, >> >> I am curious, how did you reach out the PTLs for project specific >> questions? The Octavia team didn't receive any e-mail from you or >> Allison on the topic. >> >> Michael >> >> From dirk at dmllr.de Tue May 7 20:50:21 2019 From: dirk at dmllr.de (=?UTF-8?B?RGlyayBNw7xsbGVy?=) Date: Tue, 7 May 2019 22:50:21 +0200 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> Message-ID: Am Di., 7. Mai 2019 um 22:30 Uhr schrieb Matthew Thode : > Pike - 2.18.2 -> 2.20.1 - https://review.opendev.org/640727 > Queens - 2.18.4 -> 2.20.1 - https://review.opendev.org/640710 Specifically it looks like we're already at the next issue, as tracked here: https://github.com/kennethreitz/requests/issues/5065 Any concerns from anyone on these newer urllib3 updates? I guess we'll do them a bit later though. From johnsomor at gmail.com Tue May 7 20:51:56 2019 From: johnsomor at gmail.com (Michael Johnson) Date: Tue, 7 May 2019 13:51:56 -0700 Subject: [octavia][taskflow] Adaption of persistence/jobboard (taskflow expert review needed) In-Reply-To: References: Message-ID: Hi Octavia team, Thank you for the great discussion we had at the PTG and via video conference. I am super excited that we can start work on flow resumption. I have updated the Stroyboard story for the Jobboard work based on the discussion: https://storyboard.openstack.org/#!/story/2005072 I tried to break it down into parts that multiple people could work on in parallel. Please feel free to sign up for work you are interested in or to add additional tasks you might think of. Michael On Wed, Apr 24, 2019 at 6:15 AM Anna Taraday wrote: > > Thanks for your feedback! > > The good thing about implementation of taskflow approach is that backend type is set in configs and does not affect code. We can create config settings in a flexible way, so that operators could choose which backend is preferable for their cloud. Just have one option as default for devstack, testing, etc. > Having etcd seems to be a good option, I did some experiments with it several years ago. But my concern here, if we do not have taskflow experts it may take a lot of time to implement it properly in taskflow. > > It is good to hear that some of refactor could be align with other activities and won't just disrupt the main course of work. > Implementing all of this as an alternative controller driver is a great idea! In this case we can have it as experimental feature to gather some user feedback. > > Unfortunately, I'm not attending PTG, but I hope we will find a find to discuss this in IRC. > > On Wed, Apr 24, 2019 at 5:24 AM Michael Johnson wrote: >> >> Thank you Ann for working on this. It has been on our roadmap[1] for some time. >> >> Using Taskflow JobBoard would bring huge value to Octavia by allowing >> sub-flow resumption of tasks. >> >> I inquired about this in the oslo team meeting a few weeks ago and >> sadly it seems that most if not all of the taskflow experts are no >> longer working on OpenStack. This may mean "we" are the current >> Taskflow experts.... >> >> I also inquired about adding etcd as an option for the jobs engine >> implementation. Currently only Zookeeper and Redis are implemented. >> Etcd is attractive as it provides similar functionality (to my limited >> knowledge of what Taskflow needs) and is already an OpenStack base >> service[2]. This may be an additional chunk of work to make this a >> viable option. >> >> The refactor of the flow data storage from oslo.db/sqlalchemy data >> models aligns with some of the work we need to do to make the amphora >> driver a proper Octavia driver. Currently it doesn't fully use the >> provider driver interface data passing. This work could resolve two >> issues at the same time. >> >> It also looks like you have found a reasonable solution to the >> importable flows issue. >> >> I did include this on the topic list for the PTG[3] expecting we would >> need to discuss it there. I think we have a number of questions to >> answer on this topic. >> >> 1. Do we have resources to work on this? >> 2. Is Taskflow JobBoard the right solution? Is there alternative we >> could implement without the overhead of JobBoard? Maybe a hybrid >> approach is the right answer. >> 3. Are we ok with requiring either Zookeeper or Redis for this >> functionality? Do we need to implement a TaskFlow driver for etcd? >> 4. Should this be implemented as an alternate controller driver to the >> current implementation? (yes, even the controller is a driver in >> Octavia.) >> >> Are you planning to attend the PTG? If so we can work through these >> questions there, it is already on the agenda. >> If not, we should figure out either how to include you in that >> discussion, or continue the discussion on the mailing list. >> >> Michael >> >> [1] https://wiki.openstack.org/wiki/Octavia/Roadmap >> [2] https://governance.openstack.org/tc/reference/base-services.html >> [3] https://etherpad.openstack.org/p/octavia-train-ptg >> >> On Fri, Apr 19, 2019 at 6:16 AM Anna Taraday wrote: >> > >> > Hello everyone! >> > >> > I was looking at the topic of usage taskflow persistence and jobboard in Octavia [1]. >> > I created a simple PoC to check what should be done to enable this functionality [2] . >> > >> > From what I see, taskflow expects that data, which will be stored in persitence backend/jobboard backend, is a dict or an object easily converted to dicts [3] (error [3.1]) >> > Also functions that creates flow should be importable [4] (error [4.1]). >> > >> > These two points lead to refactor required for Octavia to enable taskflow persistence and jobboard: >> > 1) Convert data which is passed between flows in dicts, at this moment it is db objects with links to other db objects. >> > 2) Create importable flow functions. >> > >> > As far as I see the only OpenStack project which adapted taskflow persistence is poppy [5] >> > >> > I'm looking for taskflow expect to take a look at all this and give some comments - whether I am correct or missing something. >> > >> > Thank you for your time in advance! >> > >> > [1] - https://storyboard.openstack.org/#!/story/2005072 >> > [2] - https://review.openstack.org/#/c/647406 >> > [3] - https://github.com/openstack/taskflow/blob/master/taskflow/persistence/backends/impl_sqlalchemy.py#L458 >> > [3.1] - http://paste.openstack.org/show/749530/ >> > [4] - https://docs.openstack.org/taskflow/latest/_modules/taskflow/engines/helpers.html#save_factory_details >> > [4.1] - http://paste.openstack.org/show/749527/ >> > [5] - https://github.com/openstack/poppy >> > >> > >> > -- >> > Regards, >> > Ann Taraday >> > Mirantis, Inc > > > > -- > Regards, > Ann Taraday > Mirantis, Inc From mriedemos at gmail.com Tue May 7 20:53:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 7 May 2019 15:53:30 -0500 Subject: [watcher][qa] Thoughts on performance testing for Watcher Message-ID: <6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com> Hi, I'm new to Watcher and would like to do some performance and scale testing in a simulated environment and wondering if anyone can give some pointers on what I could be testing or looking for. If possible, I'd like to be able to just setup a single-node devstack with the nova fake virt driver which allows me to create dozens of fake compute nodes. I could also create multiple cells with devstack, but there gets to be a limit with how much you can cram into a single node 8GB RAM 8VCPU VM (I could maybe split 20 nodes across 2 cells). I could then create dozens of VMs to fill into those compute nodes. I'm mostly trying to figure out what could be an interesting set of tests. The biggest problem I'm trying to solve with Watcher is optimizing resource utilization, i.e. once the computes hit the Tetris problem and there is some room on some nodes but none of the nodes are fully packed. I was thinking I could simulate this by configuring nova so it spreads rather than packs VMs onto hosts (or just use the chance scheduler which randomly picks a host), using VMs of varying sizes, and then run some audit / action plan (I'm still learning the terminology here) to live migrate the VMs such that they get packed onto as few hosts as possible and see how long that takes. Naturally with devstack using fake nodes and no networking on the VMs, that live migration is basically a noop, but I'm more interested in profiling how long it takes Watcher itself to execute the actions. Once I get to know a bit more about how Watcher works, I could help with optimizing some of the nova-specific stuff using placement [1]. Any advice or guidance here would be appreciated. [1] https://review.opendev.org/#/c/656448/ -- Thanks, Matt From dirk at dmllr.de Tue May 7 21:02:57 2019 From: dirk at dmllr.de (=?UTF-8?B?RGlyayBNw7xsbGVy?=) Date: Tue, 7 May 2019 23:02:57 +0200 Subject: [all|requirements|stable] update django 1.x to 1.11.20 Message-ID: Hi, a number of security issues have been fixed for django 1.11.x which is still used by horizon for python 2.x and also optionally for python 3.x. The horizon gate jobs are already using that version: http://logs.openstack.org/46/651546/1/check/horizon-openstack-tox-python3-django111/7f0a6e0/job-output.txt.gz#_2019-04-10_14_22_10_604693 as they install django without using constraints.txt . Any objections to updating the global requirements constraints to match that? Reviewing the django fixes on the 1.11.x closely only shows security and data corruption bugfixes, so it should be pretty good on the risk/benefit trade-off. Thanks, Dirk From jp.methot at planethoster.info Tue May 7 21:31:19 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Tue, 7 May 2019 17:31:19 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: References: Message-ID: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> Indeed, this is what was written in your original response as well as in the documentation. As a result, it was fairly difficult to miss and I did comment it out before restarting the service. Additionally, as per the configuration I had set up, had the log-config-append option be set, I wouldn’t have any INFO level log in my logs. Hence why I believe it is strange that I have info level logs, when I’ve set default_log_levels like this: default_log_levels = amqp=WARN,amqplib=WARN,boto=WARN,qpid=WARN,sqlalchemy=WARN,suds=WARN,oslo.messaging=WARN,iso8601=WARN,requests.packages.urllib3.connectionpool=WARN,urllib3.connectionpool=WARN,websocket=WARN,requests.packages.urllib3.util.retry=WARN,urllib3.util.retry=WARN,keystonemiddleware=WARN,routes.middleware=WARN,stevedore=WARN,taskflow=WARN,keystoneauth=WARN,oslo.cache=WARN Please understand that I am not doubting that your previous answer normally works. I have seen your presentations at past Openstack summit and know that you are a brilliant individual. However, I can only answer here that, from my observations, this is not working as intended. I’ll also add that this is on Pike, but we are slated to upgrade to Queens in the coming weeks. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 7 mai 2019 à 11:39, Jay Pipes a écrit : > > As mentioned in my original response, if you have CONF.log_config_append set to anything, then the other conf options related to logging will be ignored. > > Best, > -jay > > On Tue, May 7, 2019, 11:15 AM Jean-Philippe Méthot > wrote: > Hi, > > I’ve just tried setting everything to warn through the nova.conf option default_log_levels, as suggested. However, I’m still getting info level logs from the resource tracker like this : > > INFO nova.compute.resource_tracker > > Could the compute resource tracker logs be managed by another parameter than what’s in the default list for that configuration option? > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > >> Le 7 mai 2019 à 09:02, Jay Pipes > a écrit : >> >> On 05/06/2019 05:56 PM, Jean-Philippe Méthot wrote: >>> Hi, >>> We’ve been modifying our login habits for Nova on our Openstack setup to try to send only warning level and up logs to our log servers. To do so, I’ve created a logging.conf and configured logging according to the logging module documentation. While what I’ve done works, it seems to be a very convoluted process for something as simple as changing the logging level to warning. We worry that if we upgrade and the syntax for this configuration file changes, we may have to push more changes through ansible than we would like to. >> >> It's unlikely that the syntax for the logging configuration file will change since it's upstream Python, not OpenStack or Nova that is the source of this syntax. >> >> That said, if all you want to do is change some or all package default logging levels, you can change the value of the CONF.default_log_levels option. >> >> The default_log_levels CONF option is actually derived from the oslo_log package that is used by all OpenStack service projects. It's default value is here: >> >> https://github.com/openstack/oslo.log/blob/29671ef2bfacb416d397abc57170bb090b116f68/oslo_log/_options.py#L19-L31 >> >> So, if you don't want to mess with the standard Python logging conf, you can just change that CONF.default_log_levels option. Note that if you do specify a logging config file using a non-None CONF.log_config_append value, then all other logging configuration options (like default_log_levels) are ignored). >> >> Best, >> -jay >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at nemebean.com Tue May 7 21:45:38 2019 From: openstack at nemebean.com (Ben Nemec) Date: Tue, 7 May 2019 16:45:38 -0500 Subject: [oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI In-Reply-To: References: <229a2a53-870f-44c3-5e0c-6cfa9d45d0c5@oracle.com> <3275304e-d717-8b89-557e-b650fc4f661a@oracle.com> <20190420063850.GA18527@holtby.speedport.ip> <8b9cb0e4-b3a4-986a-be59-5bba6ae00f4e@nemebean.com> <20190503175904.GA26117@holtby> Message-ID: <8411da3c-9318-2189-5149-2beb9cab4bd0@nemebean.com> On 5/4/19 4:14 PM, Damien Ciabrini wrote: > > > On Fri, May 3, 2019 at 7:59 PM Michele Baldessari > wrote: > > On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote: > > > > > > On 4/22/19 12:53 PM, Alex Schultz wrote: > > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec > > wrote: > > > > > > > > > > > > > > > > On 4/20/19 1:38 AM, Michele Baldessari wrote: > > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, > iain.macdonnell at oracle.com wrote: > > > > > > > > > > > > Today I discovered that this problem appears to be caused > by eventlet > > > > > > monkey-patching. I've created a bug for it: > > > > > > > > > > > > https://bugs.launchpad.net/nova/+bug/1825584 > > > > > > > > > > Hi, > > > > > > > > > > just for completeness we see this very same issue also with > > > > > mistral (actually it was the first service where we noticed > the missed > > > > > heartbeats). iirc Alex Schultz mentioned seeing it in > ironic as well, > > > > > although I have not personally observed it there yet. > > > > > > > > Is Mistral also mixing eventlet monkeypatching and WSGI? > > > > > > > > > > Looks like there is monkey patching, however we noticed it with the > > > engine/executor. So it's likely not just wsgi.  I think I also > saw it > > > in the ironic-conductor, though I'd have to try it out again.  I'll > > > spin up an undercloud today and see if I can get a more > complete list > > > of affected services. It was pretty easy to reproduce. > > > > Okay, I asked because if there's no WSGI/Eventlet combination > then this may > > be different from the Nova issue that prompted this thread. It > sounds like > > that was being caused by a bad interaction between WSGI and some > Eventlet > > timers. If there's no WSGI involved then I wouldn't expect that > to happen. > > > > I guess we'll see what further investigation turns up, but based > on the > > preliminary information there may be two bugs here. > > So just to get some closure on this error that we have seen around > mistral executor and tripleo with python3: this was due to the ansible > action that called subprocess which has a different implementation in > python3 and so the monkeypatching needs to be adapted. > > Review which fixes it for us is here: > https://review.opendev.org/#/c/656901/ > > Damien and I think the nova_api/eventlet/mod_wsgi has a separate > root-cause > (although we have not spent all too much time on that one yet) > > > Right, after further investigation, it appears that the problem we saw > under mod_wsgi was due to monkey patching, as Iain originally > reported. It has nothing to do with our work on healthchecks. > > It turns out that running the AMQP heartbeat thread under mod_wsgi > doesn't work when the threading library is monkey_patched, because the > thread waits on a data structure [1] that has been monkey patched [2], > which makes it yield its execution instead of sleeping for 15s. > > Because mod_wsgi stops the execution of its embedded interpreter, the > AMQP heartbeat thread can't be resumed until there's a message to be > processed in the mod_wsgi queue, which would resume the python > interpreter and make eventlet resume the thread. > > Disabling monkey-patching in nova_api makes the scheduling issue go > away. This sounds like the right long-term solution, but it seems unlikely to be backportable to the existing releases. As I understand it some nova-api functionality has an actual dependency on monkey-patching. Is there a workaround? Maybe periodically poking the API to wake up the wsgi interpreter? > > Note: other services like heat-api do not use monkey patching and > aren't affected, so this seem to confirm that monkey-patching > shouldn't happen in nova_api running under mod_wsgi in the first > place. > > [1] > https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L904 > [2] > https://github.com/openstack/oslo.utils/blob/master/oslo_utils/eventletutils.py#L182 From iain.macdonnell at oracle.com Tue May 7 22:22:36 2019 From: iain.macdonnell at oracle.com (iain.macdonnell at oracle.com) Date: Tue, 7 May 2019 15:22:36 -0700 Subject: [oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI In-Reply-To: <8411da3c-9318-2189-5149-2beb9cab4bd0@nemebean.com> References: <229a2a53-870f-44c3-5e0c-6cfa9d45d0c5@oracle.com> <3275304e-d717-8b89-557e-b650fc4f661a@oracle.com> <20190420063850.GA18527@holtby.speedport.ip> <8b9cb0e4-b3a4-986a-be59-5bba6ae00f4e@nemebean.com> <20190503175904.GA26117@holtby> <8411da3c-9318-2189-5149-2beb9cab4bd0@nemebean.com> Message-ID: <1537695d-fe31-3e48-36d7-566a92307a93@oracle.com> On 5/7/19 2:45 PM, Ben Nemec wrote: > > > On 5/4/19 4:14 PM, Damien Ciabrini wrote: >> >> >> On Fri, May 3, 2019 at 7:59 PM Michele Baldessari > > wrote: >> >>     On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote: >>      > >>      > >>      > On 4/22/19 12:53 PM, Alex Schultz wrote: >>      > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec >>     > wrote: >>      > > > >>      > > > >>      > > > >>      > > > On 4/20/19 1:38 AM, Michele Baldessari wrote: >>      > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, >>     iain.macdonnell at oracle.com wrote: >>      > > > > > >>      > > > > > Today I discovered that this problem appears to be caused >>     by eventlet >>      > > > > > monkey-patching. I've created a bug for it: >>      > > > > > >>      > > > > > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_nova_-2Bbug_1825584&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=zgCsi2WthDNaeptBSW02iplSjxg9P_zrnfocp8P06oA&e= >> >>      > > > > >>      > > > > Hi, >>      > > > > >>      > > > > just for completeness we see this very same issue also with >>      > > > > mistral (actually it was the first service where we noticed >>     the missed >>      > > > > heartbeats). iirc Alex Schultz mentioned seeing it in >>     ironic as well, >>      > > > > although I have not personally observed it there yet. >>      > > > >>      > > > Is Mistral also mixing eventlet monkeypatching and WSGI? >>      > > > >>      > > >>      > > Looks like there is monkey patching, however we noticed it >> with the >>      > > engine/executor. So it's likely not just wsgi.  I think I also >>     saw it >>      > > in the ironic-conductor, though I'd have to try it out >> again.  I'll >>      > > spin up an undercloud today and see if I can get a more >>     complete list >>      > > of affected services. It was pretty easy to reproduce. >>      > >>      > Okay, I asked because if there's no WSGI/Eventlet combination >>     then this may >>      > be different from the Nova issue that prompted this thread. It >>     sounds like >>      > that was being caused by a bad interaction between WSGI and some >>     Eventlet >>      > timers. If there's no WSGI involved then I wouldn't expect that >>     to happen. >>      > >>      > I guess we'll see what further investigation turns up, but based >>     on the >>      > preliminary information there may be two bugs here. >> >>     So just to get some closure on this error that we have seen around >>     mistral executor and tripleo with python3: this was due to the >> ansible >>     action that called subprocess which has a different implementation in >>     python3 and so the monkeypatching needs to be adapted. >> >>     Review which fixes it for us is here: >> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_656901_&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=1o81kC60gB8_5zIgi8WugZaOma_3m7grG4RQ-aVsbSE&e= >> >> >>     Damien and I think the nova_api/eventlet/mod_wsgi has a separate >>     root-cause >>     (although we have not spent all too much time on that one yet) >> >> >> Right, after further investigation, it appears that the problem we saw >> under mod_wsgi was due to monkey patching, as Iain originally >> reported. It has nothing to do with our work on healthchecks. >> >> It turns out that running the AMQP heartbeat thread under mod_wsgi >> doesn't work when the threading library is monkey_patched, because the >> thread waits on a data structure [1] that has been monkey patched [2], >> which makes it yield its execution instead of sleeping for 15s. >> >> Because mod_wsgi stops the execution of its embedded interpreter, the >> AMQP heartbeat thread can't be resumed until there's a message to be >> processed in the mod_wsgi queue, which would resume the python >> interpreter and make eventlet resume the thread. >> >> Disabling monkey-patching in nova_api makes the scheduling issue go >> away. > > This sounds like the right long-term solution, but it seems unlikely to > be backportable to the existing releases. As I understand it some > nova-api functionality has an actual dependency on monkey-patching. Is > there a workaround? Maybe periodically poking the API to wake up the > wsgi interpreter? I've been pondering things like that ... but if I have multiple WSGI processes, can I be sure that an API-poke will hit the one(s) that need it? This is a road-block for me upgrading to Stein. I really don't want to have to go back to running nova-api standalone, but that's increasingly looking like the only "safe" option :/ ~iain >> Note: other services like heat-api do not use monkey patching and >> aren't affected, so this seem to confirm that monkey-patching >> shouldn't happen in nova_api running under mod_wsgi in the first >> place. >> >> [1] >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.messaging_blob_master_oslo-5Fmessaging_-5Fdrivers_impl-5Frabbit.py-23L904&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=O5nQh1r8Zmded00yYMXrfxL44xcd9KqFK-VOa0cg6gs&e= >> >> [2] >> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.utils_blob_master_oslo-5Futils_eventletutils.py-23L182&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=QRkXCiqv6zcnO2b2p8Uv6cgRuu1R414B9SvILuugN6w&e= >> > From cboylan at sapwetik.org Tue May 7 23:56:43 2019 From: cboylan at sapwetik.org (Clark Boylan) Date: Tue, 07 May 2019 19:56:43 -0400 Subject: [nova][CI] GPUs in the gate In-Reply-To: References: Message-ID: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> On Tue, May 7, 2019, at 10:48 AM, Artom Lifshitz wrote: > Hey all, > > Following up on the CI session during the PTG [1], I wanted to get the > ball rolling on getting GPU hardware into the gate somehow. Initially > the plan was to do it through OpenLab and by convincing NVIDIA to > donate the cards, but after a conversation with Sean McGinnis it > appears Infra have access to machines with GPUs. > > From Nova's POV, the requirements are: > * The machines with GPUs should probably be Ironic baremetal nodes and > not VMs [*]. > * The GPUs need to support virtualization. It's hard to get a > comprehensive list of GPUs that do, but Nova's own docs [2] mention > two: Intel cards with GVT [3] and NVIDIA GRID [4]. > > So I think at this point the question is whether Infra can support > those reqs. If yes, we can start concrete steps towards getting those > machines used by a CI job. If not, we'll fall back to OpenLab and try > to get them hardware. What we currently have access to is a small amount of Vexxhost's GPU instances (so mnaser can further clarify my comments here). I believe these are VMs with dedicated nvidia gpus that are passed through. I don't think they support the vgpu feature. It might help to describe the use case you are trying to meet rather than jumping ahead to requirements/solutions. That way maybe we can work with Vexxhost to better support what you need (or come up with some other solutions). For those of us that don't know all of the particulars it really does help if you can go from use case to requirements. > > [*] Could we do double-passthrough? Could the card be passed through > to the L1 guest via the PCI passthrough mechanism, and then into the > L2 guest via the mdev mechanism? > > [1] https://etherpad.openstack.org/p/nova-ptg-train-ci > [2] https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html > [3] https://01.org/igvt-g > [4] https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf From miguel at mlavalle.com Wed May 8 01:37:44 2019 From: miguel at mlavalle.com (Miguel Lavalle) Date: Tue, 7 May 2019 20:37:44 -0500 Subject: [openstack-dev] [neutron] Cancelling L3 sub-team meeting on May 8th Message-ID: Hi Neutrinos, Since we just had a long conversation on L3 topics during the PTG, we will cancel this week's meeting. We will resume normally on the 15th Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Wed May 8 02:52:14 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 8 May 2019 11:52:14 +0900 Subject: [searchlight] Train-1 milestone goals Message-ID: Hi team, So the summit is over, the holiday is over, and the Train-1 milestone [1] is coming... I would like to take this chance to discuss a little bit about our targets for Train-1. My expectation simply is continuing what we left in Stein which are: - Deprecate Elasticsearch 2.x - Support multiple OpenStack clouds Please let me know what you think via email or input in the etherpad [2]. [1] https://releases.openstack.org/train/schedule.html [2] https://etherpad.openstack.org/p/searchlight-train You rock!!! -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Wed May 8 03:17:22 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 8 May 2019 12:17:22 +0900 Subject: [telemetry] Team meeting agenda for tomorrow Message-ID: Hi team, As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting. I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too. [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda Bests, -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From li.canwei2 at zte.com.cn Wed May 8 03:26:50 2019 From: li.canwei2 at zte.com.cn (li.canwei2 at zte.com.cn) Date: Wed, 8 May 2019 11:26:50 +0800 (CST) Subject: =?UTF-8?B?UmU6W3dhdGNoZXJdW3FhXSBUaG91Z2h0cyBvbiBwZXJmb3JtYW5jZSB0ZXN0aW5nIGZvciBXYXRjaGVy?= In-Reply-To: <6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com> References: 6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com Message-ID: <201905081126508513380@zte.com.cn> Hi Matt, I'm glad that you are interesting to Watcher. Though we never do such a simulated test, I wish you can get what you want. Some notes: 1, Watcher updates its data model based nova versioned notifications, so you should enable nova notification in your simulated environment. 2, Watcher needs node name getting from CONF.host or socket.gethostname, If you have two or more controller nodes they don't have same host name. 3, Watcher doesn't consider nova cell, now watcher filter nodes through host aggregate and zone. You can get more info by CLI cmd: watcher help audittemplate create 4, Watcher needs metric data source such as Ceilometer, so your fake nodes and VMs should have metric data. 5, For optimizing resource utilization, I think you could use strategy [1] 6, There are two audit type:ONESHOT and CONTINUOUS in Watcher, you can get more help by CLI cmd: watcher help audit create If any questions, let us know Thanks, licanwei [1] https://docs.openstack.org/watcher/latest/strategies/vm_workload_consolidation.html 发件人:MattRiedemann 收件人:openstack-discuss at lists.openstack.org ; 日 期 :2019年05月08日 04:57 主 题 :[watcher][qa] Thoughts on performance testing for Watcher Hi, I'm new to Watcher and would like to do some performance and scale testing in a simulated environment and wondering if anyone can give some pointers on what I could be testing or looking for. If possible, I'd like to be able to just setup a single-node devstack with the nova fake virt driver which allows me to create dozens of fake compute nodes. I could also create multiple cells with devstack, but there gets to be a limit with how much you can cram into a single node 8GB RAM 8VCPU VM (I could maybe split 20 nodes across 2 cells). I could then create dozens of VMs to fill into those compute nodes. I'm mostly trying to figure out what could be an interesting set of tests. The biggest problem I'm trying to solve with Watcher is optimizing resource utilization, i.e. once the computes hit the Tetris problem and there is some room on some nodes but none of the nodes are fully packed. I was thinking I could simulate this by configuring nova so it spreads rather than packs VMs onto hosts (or just use the chance scheduler which randomly picks a host), using VMs of varying sizes, and then run some audit / action plan (I'm still learning the terminology here) to live migrate the VMs such that they get packed onto as few hosts as possible and see how long that takes. Naturally with devstack using fake nodes and no networking on the VMs, that live migration is basically a noop, but I'm more interested in profiling how long it takes Watcher itself to execute the actions. Once I get to know a bit more about how Watcher works, I could help with optimizing some of the nova-specific stuff using placement [1]. Any advice or guidance here would be appreciated. [1] https://review.opendev.org/#/c/656448/ -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Wed May 8 05:34:07 2019 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 8 May 2019 13:34:07 +0800 Subject: [heat][ptg] Summary for Heat from Denver Summit Message-ID: Hi all Here's the etherpad for Heat in Denver Summit and PTG: https://etherpad.openstack.org/p/DEN-Train-Heat I will update heat project onboarding video to slide page later, at meanwhile, enjoy slides for onboarding: https://goo.gl/eZ3bbH and project update: https://goo.gl/Fr6rBH *Some target items for Train cycle:* - Move to service token auth for re-auth - Hide OS::Glance::Image and mark as placeholder resource - make placeholder designate V1 resources too since they already deleted in Designate in Rocky - Heat zombie services entries recycle - Atomic ExtraRoute resource improvement - Better document and scenario test support for Auto-scaling SIG and Self-healing SIG - Ironic resources (Don't get this wrong, Heat already support Ironic by using Nova server resources. This is about directly supported Ironic resources) - Adding support for Heat in Terraform For some ongoing tasks like Vitrage Template still on our review target list too, and we will try to make sure those works for features/deprecation tasks will be landed soon as we can. *Help most needed for Heat:* We very much need for core reviewers to help to push patches in. So, please join us for help to review an develop. I hope we can still keep Heat develop more active. Like we *No meeting for this week* Since people just come back from Summit, and I pretty sure some of us still in other events now, so let's skip meeting this week. -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From li.canwei2 at zte.com.cn Wed May 8 05:38:51 2019 From: li.canwei2 at zte.com.cn (li.canwei2 at zte.com.cn) Date: Wed, 8 May 2019 13:38:51 +0800 (CST) Subject: =?UTF-8?B?W1dhdGNoZXJdIHRlYW0gbWVldGluZyBhbmQgYWdlbmRh?= Message-ID: <201905081338518155467@zte.com.cn> Hi, Watcher will have a meeting again at 08:00 UTC in the #openstack-meeting-alt channel. The agenda is available on https://wiki.openstack.org/wiki/Watcher_Meeting_Agenda feel free to add any additional items. Thanks! Canwei Li -------------- next part -------------- An HTML attachment was scrubbed... URL: From josephine.seifert at secustack.com Wed May 8 06:02:39 2019 From: josephine.seifert at secustack.com (Josephine Seifert) Date: Wed, 8 May 2019 08:02:39 +0200 Subject: [nova][cinder][glance][Barbican]Finding Timeslot for weekly Image Encryption IRC meeting In-Reply-To: <6cdb30ba-888c-cd89-5bff-f432edb90467@redhat.com> References: <6cdb30ba-888c-cd89-5bff-f432edb90467@redhat.com> Message-ID: Hi Douglas, it seems, that doodle put the "UTC"  in the second line. It should say e.g. "Mon 12 UTC" meaning Mondays at 12:00 UTC and so on. Greetings, Josephine (Luzi) Am 07.05.19 um 17:37 schrieb Douglas Mendizábal: > Hi Josephine, > > I think it's a great idea to have a recurring meeting to keep track of > the Image Encryption effort.   I tried to answer your doodle, but it > seems that it does not have actual times, just dates?  Maybe we need a > new doodle?  I live in the CDT (UTC-5) Time Zone if that helps. > > Thanks, > - Douglas Mendizábal (redrobot) > > On 5/4/19 1:57 PM, Josephine Seifert wrote: > > Hello, > > > as a result from the Summit and the PTG, I would like to hold a > > weekly IRC-meeting for the Image Encryption (soon to be a pop-up > > team). > > > As I work in Europe I have made a doodle poll, with timeslots I > > can attend and hopefully many of you. If you would like to join in > > a weekly meeting, please fill out the poll and state your name and > > the project you are working in: > > https://doodle.com/poll/wtg9ha3e5dvym6yt > > > Thank you Josephine (Luzi) > > > From li.canwei2 at zte.com.cn Wed May 8 06:19:17 2019 From: li.canwei2 at zte.com.cn (li.canwei2 at zte.com.cn) Date: Wed, 8 May 2019 14:19:17 +0800 (CST) Subject: =?UTF-8?B?UmU6W3dhdGNoZXJdW3FhXSBUaG91Z2h0cyBvbiBwZXJmb3JtYW5jZSB0ZXN0aW5nIGZvciBXYXRjaGVy?= In-Reply-To: <6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com> References: 6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com Message-ID: <201905081419177826734@zte.com.cn> another note, Watcher provides a WORKLOAD optimization(balancing or consolidation). If you want to maximize the node resource (such as vCPU, Ram...) usage through VM migration, Watcher doesn't have such a strategy now. Thanks! licanwei 原始邮件 发件人:MattRiedemann 收件人:openstack-discuss at lists.openstack.org ; 日 期 :2019年05月08日 04:57 主 题 :[watcher][qa] Thoughts on performance testing for Watcher Hi, I'm new to Watcher and would like to do some performance and scale testing in a simulated environment and wondering if anyone can give some pointers on what I could be testing or looking for. If possible, I'd like to be able to just setup a single-node devstack with the nova fake virt driver which allows me to create dozens of fake compute nodes. I could also create multiple cells with devstack, but there gets to be a limit with how much you can cram into a single node 8GB RAM 8VCPU VM (I could maybe split 20 nodes across 2 cells). I could then create dozens of VMs to fill into those compute nodes. I'm mostly trying to figure out what could be an interesting set of tests. The biggest problem I'm trying to solve with Watcher is optimizing resource utilization, i.e. once the computes hit the Tetris problem and there is some room on some nodes but none of the nodes are fully packed. I was thinking I could simulate this by configuring nova so it spreads rather than packs VMs onto hosts (or just use the chance scheduler which randomly picks a host), using VMs of varying sizes, and then run some audit / action plan (I'm still learning the terminology here) to live migrate the VMs such that they get packed onto as few hosts as possible and see how long that takes. Naturally with devstack using fake nodes and no networking on the VMs, that live migration is basically a noop, but I'm more interested in profiling how long it takes Watcher itself to execute the actions. Once I get to know a bit more about how Watcher works, I could help with optimizing some of the nova-specific stuff using placement [1]. Any advice or guidance here would be appreciated. [1] https://review.opendev.org/#/c/656448/ -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From bharat at stackhpc.com Wed May 8 06:40:04 2019 From: bharat at stackhpc.com (Bharat Kunwar) Date: Wed, 8 May 2019 07:40:04 +0100 Subject: Magnum Kubernetes openstack-cloud-controller-manager unable not resolve master node by DNS In-Reply-To: References: Message-ID: <4FFA2395-960B-4DA7-8481-F2AD93EAB500@stackhpc.com> Try using the latest version, think there is an OCCM_TAG. Sent from my iPhone > On 7 May 2019, at 20:10, Pawel Konczalski wrote: > > Hi, > > i try to deploy a Kubernetes cluster with OpenStack Magnum but the openstack-cloud-controller-manager pod fails to resolve the master node hostname. > > Does magnum require further parameter to configure the DNS names of the master and minions? DNS resolution in the VMs works fine. Currently there is no Designate installed in the OpenStack setup. > > > openstack coe cluster template create kubernetes-cluster-template1 \ > --image Fedora-AtomicHost-29-20190429.0.x86_64 \ > --external-network public \ > --dns-nameserver 8.8.8.8 \ > --master-flavor m1.kubernetes \ > --flavor m1.kubernetes \ > --coe kubernetes \ > --volume-driver cinder \ > --network-driver flannel \ > --docker-volume-size 25 > > openstack coe cluster create kubernetes-cluster1 \ > --cluster-template kubernetes-cluster-template1 \ > --master-count 1 \ > --node-count 2 \ > --keypair mykey > > > # kubectl get pods --all-namespaces -o wide > NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE > kube-system coredns-78df4bf8ff-mjp2c 0/1 Pending 0 36m > kube-system heapster-74f98f6489-tgtzl 0/1 Pending 0 36m > kube-system kube-dns-autoscaler-986c49747-wrvz4 0/1 Pending 0 36m > kube-system kubernetes-dashboard-54cb7b5997-sk5pj 0/1 Pending 0 36m > kube-system openstack-cloud-controller-manager-dgk64 0/1 CrashLoopBackOff 11 36m kubernetes-cluster1-vulg5fz6hg2n-master-0 > > > # kubectl -n kube-system logs openstack-cloud-controller-manager-dgk64 > Error from server: Get https://kubernetes-cluster1-vulg5fz6hg2n-master-0:10250/containerLogs/kube-system/openstack-cloud-controller-manager-dgk64/openstack-cloud-controller-manager: dial tcp: lookup kubernetes-cluster1-vulg5fz6hg2n-master-0 on 8.8.8.8:53: no such host > > > BR > > Pawel From cjeanner at redhat.com Wed May 8 07:07:02 2019 From: cjeanner at redhat.com (=?UTF-8?Q?C=c3=a9dric_Jeanneret?=) Date: Wed, 8 May 2019 09:07:02 +0200 Subject: [TripleO][Validations] Tag convention In-Reply-To: References: <3c383d8d-54fa-b054-f0ad-b97ed67ba03f@redhat.com> Message-ID: <5228e551-477c-129e-d621-9b1bde9a6535@redhat.com> On 5/7/19 6:24 PM, Mohammed Naser wrote: > On Tue, May 7, 2019 at 12:12 PM Emilien Macchi wrote: >> >> >> >> On Tue, May 7, 2019 at 4:44 PM Cédric Jeanneret wrote: >>> >>> Dear all, >>> >>> We're currently working hard in order to provide a nice way to run >>> validations within a deploy (aka in-flight validations). >>> >>> We can already call validations provided by the tripleo-validations >>> package[1], it's working just fine. >>> >>> Now comes the question: "how can we disable the validations?". In order >>> to do that, we propose to use a standard tag in the ansible >>> roles/playbooks, and to add a "--skip-tags " when we disable the >>> validations via the CLI or configuration. >>> >>> After a quick check in the tripleoclient code, there apparently is a tag >>> named "validation", that can already be skipped from within the client. >>> >>> So, our questions: >>> - would the reuse of "validation" be OK? >>> - if not, what tag would be best in order to avoid confusion? >>> >>> We also have the idea to allow to disable validations per service. For >>> this, we propose to introduce the following tag: >>> - validation-, like "validation-nova", "validation-neutron" and >>> so on >>> >>> What do you think about those two additions? >> >> >> Such as variables, I think we should prefix all our variables and tags with tripleo_ or something, to differentiate them from any other playbooks our operators could run. >> I would rather use "tripleo_validations" and "tripleo_validation_nova" maybe. hmm. what-if we open this framework to a wider audience? For instance, openshift folks might be interested in some validations (I have Ceph in mind), and might find weird or even bad to have "tripleo-something" (with underscore or dashes). Maybe something more generic? "vf(-nova)" ? "validation-framework(-nova)" ? Or even "opendev-validation(-nova)" Since there are also a possibility to ask for a new package name for something more generic without the "tripleo" taint.. Cheers, C. > > Just chiming in here.. the pattern we like in OSA is using dashes for > tags, I think having something like 'tripleo-validations' and > 'tripleo-validations-nova' etc > >> Wdyt? >> -- >> Emilien Macchi > > > -- Cédric Jeanneret Software Engineer - OpenStack Platform Red Hat EMEA https://www.redhat.com/ -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From sorrison at gmail.com Wed May 8 07:51:18 2019 From: sorrison at gmail.com (Sam Morrison) Date: Wed, 8 May 2019 17:51:18 +1000 Subject: [cinder] Help with a review please Message-ID: <55F040AF-16C8-4029-B306-7E81B4BE191A@gmail.com> Hi, I’ve had a review going on for over 8 months now [1] and would love to get this in, it’s had +2s over the period and keeps getting nit picked, finally being knocked back due to no spec which there now is [2] This is now stalled itself after having a +2 and it is very depressing. I have had generally positive experiences contributing to openstack but this has been a real pain, is there something I can do to make this go smoother? Thanks, Sam [1] https://review.opendev.org/#/c/599866/ [2] https://review.opendev.org/#/c/645056/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From stig.openstack at telfer.org Wed May 8 08:11:31 2019 From: stig.openstack at telfer.org (Stig Telfer) Date: Wed, 8 May 2019 09:11:31 +0100 Subject: [scientific-sig] IRC Meeting today 1100 UTC: activity areas for Train cycle Message-ID: <1432B73C-C9C8-417D-9853-A268AA3D0325@telfer.org> Hello All - We have a Scientific SIG IRC meeting today at 1100 UTC in channel #openstack-meeting. Everyone is welcome. Today’s agenda is here: https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_8th_2019 After another busy session at the Open Infra Summit and a productive time at the PTG, we have a set of priority areas of focus identified. Cheers, Stig -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Wed May 8 08:15:34 2019 From: melwittt at gmail.com (melanie witt) Date: Wed, 8 May 2019 04:15:34 -0400 Subject: [oslo][oslo-messaging][nova] Stein nova-api AMQP issue running under uWSGI In-Reply-To: <1537695d-fe31-3e48-36d7-566a92307a93@oracle.com> References: <229a2a53-870f-44c3-5e0c-6cfa9d45d0c5@oracle.com> <3275304e-d717-8b89-557e-b650fc4f661a@oracle.com> <20190420063850.GA18527@holtby.speedport.ip> <8b9cb0e4-b3a4-986a-be59-5bba6ae00f4e@nemebean.com> <20190503175904.GA26117@holtby> <8411da3c-9318-2189-5149-2beb9cab4bd0@nemebean.com> <1537695d-fe31-3e48-36d7-566a92307a93@oracle.com> Message-ID: On Tue, 7 May 2019 15:22:36 -0700, Iain Macdonnell wrote: > > > On 5/7/19 2:45 PM, Ben Nemec wrote: >> >> >> On 5/4/19 4:14 PM, Damien Ciabrini wrote: >>> >>> >>> On Fri, May 3, 2019 at 7:59 PM Michele Baldessari >> > wrote: >>> >>>     On Mon, Apr 22, 2019 at 01:21:03PM -0500, Ben Nemec wrote: >>>      > >>>      > >>>      > On 4/22/19 12:53 PM, Alex Schultz wrote: >>>      > > On Mon, Apr 22, 2019 at 11:28 AM Ben Nemec >>>     > wrote: >>>      > > > >>>      > > > >>>      > > > >>>      > > > On 4/20/19 1:38 AM, Michele Baldessari wrote: >>>      > > > > On Fri, Apr 19, 2019 at 03:20:44PM -0700, >>>     iain.macdonnell at oracle.com wrote: >>>      > > > > > >>>      > > > > > Today I discovered that this problem appears to be caused >>>     by eventlet >>>      > > > > > monkey-patching. I've created a bug for it: >>>      > > > > > >>>      > > > > > >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.launchpad.net_nova_-2Bbug_1825584&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=zgCsi2WthDNaeptBSW02iplSjxg9P_zrnfocp8P06oA&e= >>> >>>      > > > > >>>      > > > > Hi, >>>      > > > > >>>      > > > > just for completeness we see this very same issue also with >>>      > > > > mistral (actually it was the first service where we noticed >>>     the missed >>>      > > > > heartbeats). iirc Alex Schultz mentioned seeing it in >>>     ironic as well, >>>      > > > > although I have not personally observed it there yet. >>>      > > > >>>      > > > Is Mistral also mixing eventlet monkeypatching and WSGI? >>>      > > > >>>      > > >>>      > > Looks like there is monkey patching, however we noticed it >>> with the >>>      > > engine/executor. So it's likely not just wsgi.  I think I also >>>     saw it >>>      > > in the ironic-conductor, though I'd have to try it out >>> again.  I'll >>>      > > spin up an undercloud today and see if I can get a more >>>     complete list >>>      > > of affected services. It was pretty easy to reproduce. >>>      > >>>      > Okay, I asked because if there's no WSGI/Eventlet combination >>>     then this may >>>      > be different from the Nova issue that prompted this thread. It >>>     sounds like >>>      > that was being caused by a bad interaction between WSGI and some >>>     Eventlet >>>      > timers. If there's no WSGI involved then I wouldn't expect that >>>     to happen. >>>      > >>>      > I guess we'll see what further investigation turns up, but based >>>     on the >>>      > preliminary information there may be two bugs here. >>> >>>     So just to get some closure on this error that we have seen around >>>     mistral executor and tripleo with python3: this was due to the >>> ansible >>>     action that called subprocess which has a different implementation in >>>     python3 and so the monkeypatching needs to be adapted. >>> >>>     Review which fixes it for us is here: >>> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__review.opendev.org_-23_c_656901_&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=1o81kC60gB8_5zIgi8WugZaOma_3m7grG4RQ-aVsbSE&e= >>> >>> >>>     Damien and I think the nova_api/eventlet/mod_wsgi has a separate >>>     root-cause >>>     (although we have not spent all too much time on that one yet) >>> >>> >>> Right, after further investigation, it appears that the problem we saw >>> under mod_wsgi was due to monkey patching, as Iain originally >>> reported. It has nothing to do with our work on healthchecks. >>> >>> It turns out that running the AMQP heartbeat thread under mod_wsgi >>> doesn't work when the threading library is monkey_patched, because the >>> thread waits on a data structure [1] that has been monkey patched [2], >>> which makes it yield its execution instead of sleeping for 15s. >>> >>> Because mod_wsgi stops the execution of its embedded interpreter, the >>> AMQP heartbeat thread can't be resumed until there's a message to be >>> processed in the mod_wsgi queue, which would resume the python >>> interpreter and make eventlet resume the thread. >>> >>> Disabling monkey-patching in nova_api makes the scheduling issue go >>> away. >> >> This sounds like the right long-term solution, but it seems unlikely to >> be backportable to the existing releases. As I understand it some >> nova-api functionality has an actual dependency on monkey-patching. Is >> there a workaround? Maybe periodically poking the API to wake up the >> wsgi interpreter? > > I've been pondering things like that ... but if I have multiple WSGI > processes, can I be sure that an API-poke will hit the one(s) that need it? > > This is a road-block for me upgrading to Stein. I really don't want to > have to go back to running nova-api standalone, but that's increasingly > looking like the only "safe" option :/ FWIW, I have a patch series that aims to re-eliminate the eventlet dependency in nova-api: https://review.opendev.org/657750 (top patch) if you might be able to give it a try. If it helps, then maybe we could backport to Stein if folks are in support. -melanie > > >>> Note: other services like heat-api do not use monkey patching and >>> aren't affected, so this seem to confirm that monkey-patching >>> shouldn't happen in nova_api running under mod_wsgi in the first >>> place. >>> >>> [1] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.messaging_blob_master_oslo-5Fmessaging_-5Fdrivers_impl-5Frabbit.py-23L904&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=O5nQh1r8Zmded00yYMXrfxL44xcd9KqFK-VOa0cg6gs&e= >>> >>> [2] >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_openstack_oslo.utils_blob_master_oslo-5Futils_eventletutils.py-23L182&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=RxYkIjeLZPK2frXV_wEUCq8d3wvUIvDPimUcunMwbMs&m=vdmZv2wQnoFF1TIFnkN4XXdIjy0p4TKcsQ598Qbjti4&s=QRkXCiqv6zcnO2b2p8Uv6cgRuu1R414B9SvILuugN6w&e= >>> >> > From bdobreli at redhat.com Wed May 8 08:47:34 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 8 May 2019 10:47:34 +0200 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> References: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> Message-ID: On 07.05.2019 23:31, Jean-Philippe Méthot wrote: > Indeed, this is what was written in your original response as well as in > the documentation. As a result, it was fairly difficult to miss and I > did comment it out before restarting the service. Additionally, as per There is also a deprecated (but still working) log_config [0]. So please double-check you don't have that configuration leftover. Another caveat might be that SIGHUP does not propagate to all of the child processes/threads/whatever to update its logging configs with the new default_log_levels and removed log_config(_append) ones... But you said you are restarting not reloading, so prolly can't be a problem here. [0] https://opendev.org/openstack/oslo.log/src/branch/master/oslo_log/_options.py#L47 > the configuration I had set up, had the log-config-append option be set, > I wouldn’t have any INFO level log in my logs. Hence why I believe it is > strange that I have info level logs, when I’ve set default_log_levels > like this: > > default_log_levels > = amqp=WARN,amqplib=WARN,boto=WARN,qpid=WARN,sqlalchemy=WARN,suds=WARN,oslo.messaging=WARN,iso8601=WARN,requests.packages.urllib3.connectionpool=WARN,urllib3.connectionpool=WARN,websocket=WARN,requests.packages.urllib3.util.retry=WARN,urllib3.util.retry=WARN,keystonemiddleware=WARN,routes.middleware=WARN,stevedore=WARN,taskflow=WARN,keystoneauth=WARN,oslo.cache=WARN > > Please understand that I am not doubting that your previous answer > normally works. I have seen your presentations at past Openstack summit > and know that you are a brilliant individual. However, I can only answer > here that, from my observations, this is not working as intended. > > I’ll also add that this is on Pike, but we are slated to upgrade to > Queens in the coming weeks. > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > >> Le 7 mai 2019 à 11:39, Jay Pipes > > a écrit : >> >> As mentioned in my original response, if you have >> CONF.log_config_append set to anything, then the other conf options >> related to logging will be ignored. >> >> Best, >> -jay >> >> On Tue, May 7, 2019, 11:15 AM Jean-Philippe Méthot >> > wrote: >> >> Hi, >> >> I’ve just tried setting everything to warn through the nova.conf >> option default_log_levels, as suggested. However, I’m still >> getting info level logs from the resource tracker like this : >> >> INFO nova.compute.resource_tracker >> >> Could the compute resource tracker logs be managed by another >> parameter than what’s in the default list for that configuration >> option? >> >> Best regards, >> >> Jean-Philippe Méthot >> Openstack system administrator >> Administrateur système Openstack >> PlanetHoster inc. >> >> >> >> >>> Le 7 mai 2019 à 09:02, Jay Pipes >> > a écrit : >>> >>> On 05/06/2019 05:56 PM, Jean-Philippe Méthot wrote: >>>> Hi, >>>> We’ve been modifying our login habits for Nova on our Openstack >>>> setup to try to send only warning level and up logs to our log >>>> servers. To do so, I’ve created a logging.conf and configured >>>> logging according to the logging module documentation. While >>>> what I’ve done works, it seems to be a very convoluted process >>>> for something as simple as changing the logging level to >>>> warning. We worry that if we upgrade and the syntax for this >>>> configuration file changes, we may have to push more changes >>>> through ansible than we would like to. >>> >>> It's unlikely that the syntax for the logging configuration file >>> will change since it's upstream Python, not OpenStack or Nova >>> that is the source of this syntax. >>> >>> That said, if all you want to do is change some or all package >>> default logging levels, you can change the value of the >>> CONF.default_log_levels option. >>> >>> The default_log_levels CONF option is actually derived from the >>> oslo_log package that is used by all OpenStack service projects. >>> It's default value is here: >>> >>> https://github.com/openstack/oslo.log/blob/29671ef2bfacb416d397abc57170bb090b116f68/oslo_log/_options.py#L19-L31 >>> >>> So, if you don't want to mess with the standard Python logging >>> conf, you can just change that CONF.default_log_levels option. >>> Note that if you do specify a logging config file using a >>> non-None CONF.log_config_append value, then all other logging >>> configuration options (like default_log_levels) are ignored). >>> >>> Best, >>> -jay >>> >> > -- Best regards, Bogdan Dobrelya, Irc #bogdando From bdobreli at redhat.com Wed May 8 09:18:22 2019 From: bdobreli at redhat.com (Bogdan Dobrelya) Date: Wed, 8 May 2019 11:18:22 +0200 Subject: [ironic][tripleo] My PTG & Forum notes In-Reply-To: <7313c6aa-1693-2cb0-4ed9-a73646764070@redhat.com> References: <7313c6aa-1693-2cb0-4ed9-a73646764070@redhat.com> Message-ID: <896f2331-139d-acfe-5115-248411eb6b35@redhat.com> On 07.05.2019 19:47, Dmitry Tantsur wrote: > Hi folks, > > I've published my personal notes from the PTG & Forum in Denver: > https://dtantsur.github.io/posts/ironic-denver-2019/ > They're probably opinionated and definitely not complete, but I still > think they could be useful. > > Also pasting the whole raw RST text below for ease of commenting. > > Cheers, > Dmitry > > > Keynotes > ======== > > The `Metal3`_ project got some spotlight during the keynotes. A > (successful!) > `live demo`_ was done that demonstrated using Ironic through Kubernetes > API to > drive provisioning of bare metal nodes. this is very interesting to consider for TripleO integration alongside (or alternatively?) standalone Ironic, see my note below > > The official `bare metal program`_ was announced to promote managing > bare metal > infrastructure via OpenStack. > > Forum: standalone Ironic > ======================== > > On Monday we had two sessions dedicated to the future development of > standalone > Ironic (without Nova or without any other OpenStack services). > > During the `standalone roadmap session`_ the audience identified two > potential > domains where we could provide simple alternatives to depending on > OpenStack > services: > > * Alternative authentication. It was mentioned, however, that Keystone is a >   relatively easy service to install and operate, so adding this to Ironic >   may not be worth the effort. > > * Multi-tenant networking without Neutron. We could use networking-ansible_ >   directly, since they are planning on providing a Python API > independent of >   their ML2 implementation. > > Next, firmware update support was a recurring topic (also in hallway > conversations and also in non-standalone context). Related to that, a > driver > feature matrix documentation was requested, so that such driver-specific > features are easier to discover. > > Then we had a separate `API multi-tenancy session`_. Three topic were > covered: > > * Wiring in the existing ``owner`` field for access control. > >   The idea is to allow operations for non-administrator users only to > nodes >   with ``owner`` equal to their project (aka tenant) ID. In the > non-keystone >   context this field would stay free-form. We did not agree whether we > need an >   option to enable this feature. > >   An interesting use case was mentioned: assign a non-admin user to > Nova to >   allocate it only a part of the bare metal pool instead of all nodes. > >   We did not reach a consensus on using a schema with the ``owner`` field, >   e.g. where ``keystone://{project ID}`` represents a Keystone project ID. > > * Adding a new field (e.g. ``deployed_by``) to track a user that requested >   deploy for auditing purposes. > >   We agreed that the ``owner`` field should not be used for this > purpose, and >   overall it should never be changed automatically by Ironic. > > * Adding some notion of *node leased to*, probably via a new field. > >   This proposal was not well defined during the session, but we > probably would >   allow some subset of API to lessees using the policy mechanism. It > became >   apparent that implementing a separate *deployment API endpoint* is > required >   to make such policy possible. > > Creating the deployment API was identified as a potential immediate action > item. Wiring the ``owner`` field can also be done in the Train cycle, if we > find volunteers to push it forward. > > PTG: scientific SIG > =================== > > The PTG started for me with the `Scientific SIG discussions`_ of desired > features and fixes in Ironic. > > The hottest topic was reducing the deployment time by reducing the > number of > reboots that are done during the provisioning process. `Ramdisk deploy`_ > was identified as a very promising feature to solve this, as well as enable > booting from remote volumes not supported directly by Ironic and/or Cinder. > A few SIG members committed to testing it as soon as possible. > > Two related ideas were proposed for later brainstorming: > > * Keeping some proportion of nodes always on and with IPA booted. This is >   basing directly on the `fast-track deploy`_ work completed in the Stein >   cycle. A third party orchestrator would be needed for keeping the > percentage, >   but Ironic will have to provide an API to boot an ``available`` node > into the >   ramdisk. > > * Allow using *kexec* to instantly switch into a freshly deployed operating >   system. > > Combined together, these features can allow zero-reboot deployments. > > PTG: Ironic > =========== > > Community sustainability > ------------------------ > > We seem to have a disbalance in reviews, with very few people handling the > majority of reviews, and some of them are close to burning out. > > * The first thing we discussed is simplifying the specs process. We > considered a >   single +2 approval for specs and/or documentation. Approving > documentation >   cannot break anyone, and follow-ups are easy, so it seems a good > idea. We did >   not reach a firm agreement on a single +2 approval for specs; I > personally >   feel that it would only move the bottleneck from specs to the code. > > * Facilitating deprecated feature removals can help clean up the code, > and it >   can often be done by new contributors. We would like to maintain a > list of >   what can be removed when, so that we don't forget it. > > * We would also like to switch to single +2 for stable backports. This > needs >   changing the stable policy, and Tony volunteered to propose it. > > We felt that we're adding cores at a good pace, Julia had been mentoring > people > that wanted it. We would like people to volunteer, then we can mentor > them into > core status. > > However, we were not so sure we wanted to increase the stable core team. > This > team is supposed to be a small number of people that know quite a few small > details of the stable policy (e.g. requirements changes). We thought we > should > better switch to single +2 approval for the existing team. > > Then we discussed moving away from WSME, which is barely maintained by a > team > of not really interested individuals. The proposal was to follow the > example of > Keystone and just move to Flask. We can use ironic-inspector as an > example, and > probably migrate part by part. JSON schema could replace WSME objects, > similarly to how Nova does it. I volunteered to come up with a plan to > switch, > and some folks from Intel expressed interest in participating. > > Standalone roadmap > ------------------ > > We started with a recap of items from `Forum: standalone Ironic`_. > > While discussing creating a driver matrix, we realized that we could keep > driver capabilities in the source code (similar to existing iSCSI boot) and > generate the documentation from it. Then we could go as far as exposing > this > information in the API. > > During the multi-tenancy discussion, the idea of owner and lessee fields > was > well received. Julia volunteered to write a specification for that. We > clarified the following access control policies implemented by default: > > * A user can list or show nodes if they are an administrator, an owner of a >   node or a leaser of this node. > * A user can deploy or undeploy a node (through the future deployment > API) if >   they are an administrator, an owner of this node or a lessee of this > node. > * A user can update a node or any of its resources if they are an > administrator >   or an owner of this node. A lessee of a node can **not** update it. > > The discussion of recording the user that did a deployment turned into > discussing introducing a searchable log of changes to node power and > provision > states. We did not reach a final consensus on it, and we probably need a > volunteer to push this effort forward. > > Deploy steps continued > ---------------------- > > This session was dedicated to making the deploy templates framework more > usable > in practice. > > * We need to implement support for in-band deploy steps (other than the >   built-in ``deploy.deploy`` step). We probably need to start IPA before >   proceeding with the steps, similarly to how it is done with cleaning. > > * We agreed to proceed with splitting the built-in core step, making it a >   regular deploy step, as well as removing the compatibility shim for > drivers >   that do not support deploy steps. We will probably separate writing > an image >   to disk, writing a configdrive and creating a bootloader. > >   The latter could be overridden to provide custom kernel parameters. > > * To handle potential differences between deploy steps in different > hardware >   types, we discussed the possibility of optionally including a > hardware type >   or interface name in a clean step. Such steps will only be run for > nodes with >   matching hardware type or interface. > > Mark and Ruby volunteered to write a new spec on these topics. > > Day 2 operational workflow > -------------------------- > > For deployments with external health monitoring, we need a way to represent > the state when a deployed node looks healthy from our side but is detected > as failed by the monitoring. > > It seems that we could introduce a new state transition from ``active`` to > something like ``failed`` or ``quarantined``, where a node is still > deployed, > but explicitly marked as at fault by an operator. On unprovisioning, > this node > would not become ``available`` automatically. We also considered the > possibility of using a flag instead of a new state, although the > operators in > the room were more in favor of using a state. We largely agreed that the > already overloaded ``maintenance`` flag should not be used for this. > > On the Nova side we would probably use the ``error`` state to reflect > nodes in > the new state. > > A very similar request had been done for node retirement support. We > decided to > look for a unified solution. > > DHCP-less deploy > ---------------- > > We discussed options to avoid relying on DHCP for deploying. > > * An existing specification proposes attaching IP information to virtual > media. >   The initial contributors had become inactive, so we decided to help > this work >   to go through. Volunteers are welcome. > > * As an alternative to that, we discussed using IPv6 SLAAC with > multicast DNS >   (routed across WAN for Edge cases). A couple of folks on the room > volunteered >   to help with testing. We need to fix python-zeroconf_ to support > IPv6, which >   is something I'm planning on. > > Nova room > --------- > > In a cross-project discussion with the Nova team we went through a few > topics: > > * Whether Nova should use new Ironic API to build config drives. Since > Ironic >   is not the only driver building config drives, we agreed that it > probably >   doesn't make much sense to change that. > > * We did not come to a conclusion on deprecating capabilities. We agreed > that >   Ironic has to provide alternatives for ``boot_option`` and ``boot_mode`` >   capabilities first. These will probably become deploy steps or built-in >   traits. > > * We agreed that we should switch Nova to using *openstacksdk* instead of >   *ironicclient* to access Ironic. This work had already been in progress. > > Faster deploy > ------------- > > We followed up to `PTG: scientific SIG`_ with potential action items on > speeding up the deployment process by reducing the number of reboots. We > discussed an ability to keep all or some nodes powered on and > heartbeating in > the ``available`` state: > > * Add an option to keep the ramdisk running after cleaning. > >   * For this to work with multi-tenant networking we'll need an IPA > command to >     reset networking. > > * Add a provisioning verb going from ``available`` to ``available`` > booting the >   node into IPA. > > * Make sure that pre-booted nodes are prioritized for scheduling. We will >   probably dynamically add a special trait. Then we'll have to update both >   Nova/Placement and the allocation API to support preferred (optional) > traits. > > We also agreed that we could provide an option to *kexec* instead of > rebooting > as an advanced deploy step for operators that really know their hardware. > Multi-tenant networking can be tricky in this case, since there is no safe > point to switch from deployment to tenant network. We will probably take > a best > effort approach: command IPA to shutdown all its functionality and > schedule a > *kexec* after some time. After that, switch to tenant networks. This is not > entirely secure, but will probably fit the operators (HPC) who requests it. > > Asynchronous clean steps > ------------------------ > > We discussed enhancements for asynchronous clean and deploy steps. > Currently > running a step asynchronously requires either polling in a loop (occupying > a green thread) or creating a new periodic task in a hardware type. We > came up > with two proposed updates for clean steps: > > * Allow a clean step to request re-running itself after certain amount of >   time. E.g. a clean step would do something like > >   .. code-block:: python > >     @clean_step(...) >     def wait_for_raid(self): >         if not raid_is_ready(): >             return RerunAfter(60) > >   and the conductor would schedule re-running the same step in 60 seconds. > > * Allow a clean step to spawn more clean steps. E.g. a clean step would >   do something like > >   .. code-block:: python > >     @clean_step(...) >     def create_raid_configuration(self): >         start_create_raid() >         return RunNext([{'step': 'wait_for_raid'}]) > >   and the conductor would insert the provided step to ``node.clean_steps`` >   after the current one and start running it. > >   This would allow for several follow-up steps as well. A use case is a > clean >   step for resetting iDRAC to a clean state that in turn consists of > several >   other clean steps. The idea of sub-steps was deemed too complicated. > > PTG: TripleO > ============ > > We discussed our plans for removing Nova from the TripleO undercloud and > moving bare metal provisioning from under control of Heat. The plan from > the I wish we could have Metal3 provisioning via K8s API adapted for Undercloud in TripleO. Probably via a) standalone kubelet or b) k3s [0]. The former provides only kubelet running static pods, no API server et al. The latter is a lightweight k8s distro (a 10MB memory footprint or so) and may be as well used to spawn some very limited kubelet and API server setup for Metal3 to drive the provisioning of overclouds outside of Heat and Neutron. [0] https://www.cnrancher.com/blog/2019/2019-02-26-introducing-k3s-the-lightweight-kubernetes-distribution-built-for-the-edge/ > `nova-less-deploy specification`_, as well as the current state > of the implementation, were presented. > > The current concerns are: > > * upgrades from a Nova based deployment (probably just wipe the Nova >   database), > * losing user experience of ``nova list`` (largely compensated by >   ``metalsmith list``), > * tracking IP addresses for networks other than *ctlplane* (solved the same >   way as for deployed servers). > > The next action item is to create a CI job based on the already merged > code and > verify a few assumptions made above. > > PTG: Ironic, Placement, Blazar > ============================== > > We reiterated over our plans to allow Ironic to optionally report nodes to > Placement. This will be turned off when Nova is present to avoid > conflicts with > the Nova reporting. We will optionally use Placement as a backend for > Ironic > allocation API (which is something that had been planned before). > > Then we discussed potentially exposing detailed bare metal inventory to > Placement. To avoid partial allocations, Placement could introduce new > API to > consume the whole resource provider. Ironic would use it when creating an > allocation. No specific commitments were made with regards to this idea. > > Finally we came with the following workflow for bare metal reservations in > Blazar: > > #. A user requests a bare metal reservation from Blazar. > #. Blazar fetches allocation candidates from Placement. > #. Blazar fetches a list of bare metal nodes from Ironic and filters out >    allocation candidates, whose resource provider UUID does not match > one of >    the node UUIDs. > #. Blazar remembers the node UUID and returns the reservation UUID to > the user. > > When the reservation time comes: > > #. Blazar creates an allocation in Ironic (not Placement) with the > candidate >    node matching previously picked node and allocation UUID matching the >    reservation UUID. > #. When the enhancements in `Standalone roadmap`_ are implemented, > Blazar will >    also set the node's lessee field to the user ID of the reservation, > so that >    Ironic allows access to this node. > #. A user fetches an Ironic allocation corresponding to the Blazar > reservation >    UUID and learns the node UUID from it. > #. A user proceeds with deploying the node. > > Side and hallway discussions > ============================ > > * We discussed having Heat resources for Ironic. We recommended the team to >   start with Allocation and Deployment resources (the latter being virtual >   until we implement the planned deployment API). > > * We prototyped how Heat resources for Ironic could look, including > Node, Port, >   Allocation and Deployment as a first step. > > .. _Metal3: http://metal3.io > .. _live demo: > https://www.openstack.org/videos/summits/denver-2019/openstack-ironic-and-bare-metal-infrastructure-all-abstractions-start-somewhere > > .. _bare metal program: https://www.openstack.org/bare-metal/ > .. _standalone roadmap session: > https://etherpad.openstack.org/p/DEN-train-next-steps-for-standalone-ironic > .. _networking-ansible: https://opendev.org/x/networking-ansible > .. _API multi-tenancy session: > https://etherpad.openstack.org/p/DEN-train-ironic-multi-tenancy > .. _Scientific SIG discussions: > https://etherpad.openstack.org/p/scientific-sig-ptg-train > .. _Ramdisk deploy: > https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html#ramdisk-deploy > > .. _fast-track deploy: https://storyboard.openstack.org/#!/story/2004965 > .. _python-zeroconf: https://github.com/jstasiak/python-zeroconf > .. _nova-less-deploy specification: > http://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html > > -- Best regards, Bogdan Dobrelya, Irc #bogdando From geguileo at redhat.com Wed May 8 10:01:56 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Wed, 8 May 2019 12:01:56 +0200 Subject: [cinder] Help with a review please In-Reply-To: <55F040AF-16C8-4029-B306-7E81B4BE191A@gmail.com> References: <55F040AF-16C8-4029-B306-7E81B4BE191A@gmail.com> Message-ID: <20190508100156.ypwpt4ouxzw7r2ld@localhost> On 08/05, Sam Morrison wrote: > Hi, > > I’ve had a review going on for over 8 months now [1] and would love to get this in, it’s had +2s over the period and keeps getting nit picked, finally being knocked back due to no spec which there now is [2] > This is now stalled itself after having a +2 and it is very depressing. > > I have had generally positive experiences contributing to openstack but this has been a real pain, is there something I can do to make this go smoother? > > Thanks, > Sam > Hi Sam, I agree, it can be very frustrating when your patch gets somehow stuck in review, and while the spec and the patch looks good to me, I cannot say that I see much point in the feature itself. If the primary reason to add this new key-value pair in the API response is for aggregation, then the caller could do that same thing with an additional call to get the service list, where it could get the AZs of the different backends and then do the aggregation. To me that would be reasonable, since the AZ is not really a usage stat. Are there any other use cases? Cheers, Gorka. > > [1] https://review.opendev.org/#/c/599866/ > [2] https://review.opendev.org/#/c/645056/ From massimo.sgaravatto at gmail.com Wed May 8 10:35:30 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Wed, 8 May 2019 12:35:30 +0200 Subject: [nova][ops] 'Duplicate entry for primary key' problem running nova-manage db archive_deleted_rows Message-ID: Hi Fron time to time I use to move entries related to deleted instances to shadow tables, using the command: nova-manage db archive_deleted_rows This is now failing [*] for the instance_metadata table because of a 'duplicate entry for the primary key' problem: DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '6' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_metadata (created_at, updated_at, deleted_at, deleted, id, `key`, value, instance_uuid) SELECT instance_metadata.created_at, instance_metadata.updated_at, instance_metadata.deleted_at, instance_metadata.deleted, instance_metadata.id, instance_metadata.`key`, instance_metadata.value, instance_metadata.instance_uuid \nFROM instance_metadata \nWHERE instance_metadata.deleted != %(deleted_1)s ORDER BY instance_metadata.id \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'deleted_1': 0}] Indeed: mysql> SELECT instance_metadata.created_at, instance_metadata.updated_at, instance_metadata.deleted_at, instance_metadata.deleted, instance_metadata.id, instance_metadata.`key`, instance_metadata.value, instance_metadata.instance_uuid FROM instance_metadata WHERE instance_metadata.deleted != 0 ORDER BY instance_metadata.id limit 1; +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ | created_at | updated_at | deleted_at | deleted | id | key | value | instance_uuid | +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ | 2018-09-20 07:40:56 | NULL | 2018-09-20 07:54:26 | 6 | 6 | group | node | a9000ff7-2298-454c-bf71-9e3c62ec0f3c | +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ 1 row in set (0.00 sec) But there is a 5-years old entry (if I am not wrong we were running Havana at that time) in the shadow table with that id: mysql> select * from shadow_instance_metadata where id='6'; +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ | created_at | updated_at | deleted_at | id | key | value | instance_uuid | deleted | +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ | 2014-11-04 12:57:10 | NULL | 2014-11-04 13:06:45 | 6 | director | microbosh-openstack | 5db5b17b-69f2-4f0a-bdd2-efe710268021 | 6 | +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ 1 row in set (0.00 sec) mysql> I wonder how could that happen. Can I simply remove that entry from the shadow table (I am not really interested to keep it) or are there better (cleaner) way to fix the problem ? This Cloud is now running Ocata Thanks, Massimo [*] [root at cld-ctrl-01 ~]# nova-manage db archive_deleted_rows --max_rows 1000 --verbose An error has occurred: Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 1617, in main ret = fn(*fn_args, **fn_kwargs) File "/usr/lib/python2.7/site-packages/nova/cmd/manage.py", line 691, in archive_deleted_rows run = db.archive_deleted_rows(max_rows) File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 2040, in archive_deleted_rows return IMPL.archive_deleted_rows(max_rows=max_rows) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6564, in archive_deleted_rows tablename, max_rows=max_rows - total_rows_archived) File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 6513, in _archive_deleted_rows_for_table conn.execute(insert) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 914, in execute return meth(self, multiparams, params) File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection return connection._execute_clauseelement(self, multiparams, params) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement compiled_sql, distilled_params File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context context) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1337, in _handle_dbapi_exception util.raise_from_cause(newraise, exc_info) File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause reraise(type(exception), exception, tb=exc_tb) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context context) File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute cursor.execute(statement, parameters) File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 165, in execute result = self._query(query) File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 321, in _query conn.query(q) File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 860, in query self._affected_rows = self._read_query_result(unbuffered=unbuffered) File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1061, in _read_query_result result.read() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1349, in read first_packet = self.connection._read_packet() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1018, in _read_packet packet.check_error() File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 384, in check_error err.raise_mysql_exception(self._data) File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 107, in raise_mysql_exception raise errorclass(errno, errval) DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry '6' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_metadata (created_at, updated_at, deleted_at, deleted, id, `key`, value, instance_uuid) SELECT instance_metadata.created_at, instance_metadata.updated_at, instance_metadata.deleted_at, instance_metadata.deleted, instance_metadata.id, instance_metadata.`key`, instance_metadata.value, instance_metadata.instance_uuid \nFROM instance_metadata \nWHERE instance_metadata.deleted != %(deleted_1)s ORDER BY instance_metadata.id \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'deleted_1': 0}] [root at cld-ctrl-01 ~]# -------------- next part -------------- An HTML attachment was scrubbed... URL: From dharmendra.kushwaha at india.nec.com Wed May 8 11:02:57 2019 From: dharmendra.kushwaha at india.nec.com (Dharmendra Kushwaha) Date: Wed, 8 May 2019 11:02:57 +0000 Subject: [Tacker] Train vPTG meetup schedule Message-ID: Dear all, We have planned to have our one-day virtual PTG meetup for Train cycle on below schedule. Please find the meeting details: Schedule: 14th May 2019, 08:00 UTC to 12:00 UTC Meeting Channel: https://bluejeans.com/614072564 Etherpad link: https://etherpad.openstack.org/p/Tacker-PTG-Train We will try to cover our topics within this schedule. If needed, we can extend it. Thanks & Regards Dharmendra Kushwaha From surya.seetharaman9 at gmail.com Wed May 8 11:50:47 2019 From: surya.seetharaman9 at gmail.com (Surya Seetharaman) Date: Wed, 8 May 2019 13:50:47 +0200 Subject: [nova][ops] 'Duplicate entry for primary key' problem running nova-manage db archive_deleted_rows In-Reply-To: References: Message-ID: Hi, On Wed, May 8, 2019 at 12:41 PM Massimo Sgaravatto < massimo.sgaravatto at gmail.com> wrote: > Hi > > Fron time to time I use to move entries related to deleted instances to > shadow tables, using the command: > > nova-manage db archive_deleted_rows > > This is now failing [*] for the instance_metadata table because of a > 'duplicate entry for the primary key' problem: > > DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry > '6' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_metadata > (created_at, updated_at, deleted_at, deleted, id, `key`, value, > instance_uuid) SELECT instance_metadata.created_at, > instance_metadata.updated_at, instance_metadata.deleted_at, > instance_metadata.deleted, instance_metadata.id, instance_metadata.`key`, > instance_metadata.value, instance_metadata.instance_uuid \nFROM > instance_metadata \nWHERE instance_metadata.deleted != %(deleted_1)s ORDER > BY instance_metadata.id \n LIMIT %(param_1)s'] [parameters: {u'param_1': > 1, u'deleted_1': 0}] > > > Indeed: > > mysql> SELECT instance_metadata.created_at, instance_metadata.updated_at, > instance_metadata.deleted_at, instance_metadata.deleted, > instance_metadata.id, instance_metadata.`key`, instance_metadata.value, > instance_metadata.instance_uuid FROM instance_metadata WHERE > instance_metadata.deleted != 0 ORDER BY instance_metadata.id limit 1; > > +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ > | created_at | updated_at | deleted_at | deleted | id | > key | value | instance_uuid | > > +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ > | 2018-09-20 07:40:56 | NULL | 2018-09-20 07:54:26 | 6 | 6 | > group | node | a9000ff7-2298-454c-bf71-9e3c62ec0f3c | > > +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ > 1 row in set (0.00 sec) > > > But there is a 5-years old entry (if I am not wrong we were running Havana > at that time) in the shadow table with that id: > > mysql> select * from shadow_instance_metadata where id='6'; > > +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ > | created_at | updated_at | deleted_at | id | key | > value | instance_uuid | deleted | > > +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ > | 2014-11-04 12:57:10 | NULL | 2014-11-04 13:06:45 | 6 | director | > microbosh-openstack | 5db5b17b-69f2-4f0a-bdd2-efe710268021 | 6 | > > +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ > 1 row in set (0.00 sec) > > mysql> > > > I wonder how could that happen. > > Can I simply remove that entry from the shadow table (I am not really > interested to keep it) or are there better (cleaner) way to fix the problem > ? > > > This Cloud is now running Ocata > > Thanks, Massimo > > >From what I can understand, it looks like a record with id 6 was archived long back (havana-ish) and then there was a new record with id 6 again ready to be archived ? (not sure how there could have been two records with same id since ids are incremental even over releases, I am not sure of the history though since I wasn't involved with OS then). I think the only way out is to manually delete that entry from the shadow table if you don't want it. There should be no harm in removing it. We have a "nova-manage db purge [--all] [--before ] [--verbose] [--all-cells]" command that removes records from shadow_tables ( https://docs.openstack.org/nova/rocky/cli/nova-manage.html) but it was introduced in rocky. So it won't be available in Ocata unfortunately. Cheers, Surya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From alifshit at redhat.com Wed May 8 12:46:56 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Wed, 8 May 2019 08:46:56 -0400 Subject: [nova][CI] GPUs in the gate In-Reply-To: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> References: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> Message-ID: On Tue, May 7, 2019 at 8:00 PM Clark Boylan wrote: > > On Tue, May 7, 2019, at 10:48 AM, Artom Lifshitz wrote: > > Hey all, > > > > Following up on the CI session during the PTG [1], I wanted to get the > > ball rolling on getting GPU hardware into the gate somehow. Initially > > the plan was to do it through OpenLab and by convincing NVIDIA to > > donate the cards, but after a conversation with Sean McGinnis it > > appears Infra have access to machines with GPUs. > > > > From Nova's POV, the requirements are: > > * The machines with GPUs should probably be Ironic baremetal nodes and > > not VMs [*]. > > * The GPUs need to support virtualization. It's hard to get a > > comprehensive list of GPUs that do, but Nova's own docs [2] mention > > two: Intel cards with GVT [3] and NVIDIA GRID [4]. > > > > So I think at this point the question is whether Infra can support > > those reqs. If yes, we can start concrete steps towards getting those > > machines used by a CI job. If not, we'll fall back to OpenLab and try > > to get them hardware. > > What we currently have access to is a small amount of Vexxhost's GPU instances (so mnaser can further clarify my comments here). I believe these are VMs with dedicated nvidia gpus that are passed through. I don't think they support the vgpu feature. > > It might help to describe the use case you are trying to meet rather than jumping ahead to requirements/solutions. That way maybe we can work with Vexxhost to better support what you need (or come up with some other solutions). For those of us that don't know all of the particulars it really does help if you can go from use case to requirements. Right, apologies, I got ahead of myself. The use case is CI coverage for Nova's VGPU feature. This feature can be summarized (and oversimplified) as "SRIOV for GPUs": a single physical GPU can be split into multiple virtual GPUs (via libvirt's mdev support [5]), each one being assigned to a different guest. We have functional tests in-tree, but no tests with real hardware. So we're looking for a way to get real hardware in the gate. I hope that clarifies things. Let me know if there are further questions. [5] https://libvirt.org/drvnodedev.html#MDEVCap > > > > > [*] Could we do double-passthrough? Could the card be passed through > > to the L1 guest via the PCI passthrough mechanism, and then into the > > L2 guest via the mdev mechanism? > > > > [1] https://etherpad.openstack.org/p/nova-ptg-train-ci > > [2] https://docs.openstack.org/nova/rocky/admin/virtual-gpu.html > > [3] https://01.org/igvt-g > > [4] https://docs.nvidia.com/grid/5.0/pdf/grid-vgpu-user-guide.pdf From rosmaita.fossdev at gmail.com Wed May 8 13:18:17 2019 From: rosmaita.fossdev at gmail.com (Brian Rosmaita) Date: Wed, 8 May 2019 09:18:17 -0400 Subject: [glance] 9 May meeting cancelled Message-ID: The Glance team had a very productive PTG and needs some recovery time, so there will be no weekly meeting tomorrow (Thursday 9 May). The weekly meetings will resume at their usual time (14:00 UTC) on Thursday 16 May. For any issues that can't wait, people will be available as usual in #openstack-glance -- and there's always the ML. cheers, brian From fungi at yuggoth.org Wed May 8 13:27:10 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 May 2019 13:27:10 +0000 Subject: [nova][CI] GPUs in the gate In-Reply-To: References: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> Message-ID: <20190508132709.xgq6nz3mqkfw3q5d@yuggoth.org> On 2019-05-08 08:46:56 -0400 (-0400), Artom Lifshitz wrote: [...] > The use case is CI coverage for Nova's VGPU feature. This feature can > be summarized (and oversimplified) as "SRIOV for GPUs": a single > physical GPU can be split into multiple virtual GPUs (via libvirt's > mdev support [5]), each one being assigned to a different guest. We > have functional tests in-tree, but no tests with real hardware. So > we're looking for a way to get real hardware in the gate. [...] Long shot, but since you just need the feature provided and not the performance it usually implies, are there maybe any open source emulators which provide the same instruction set for conformance testing purposes? -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jaypipes at gmail.com Wed May 8 13:35:36 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Wed, 8 May 2019 09:35:36 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> References: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> Message-ID: Sorry for delayed response... comments inline. On 05/07/2019 05:31 PM, Jean-Philippe Méthot wrote: > Indeed, this is what was written in your original response as well as in > the documentation. As a result, it was fairly difficult to miss and I > did comment it out before restarting the service. Additionally, as per > the configuration I had set up, had the log-config-append option be set, > I wouldn’t have any INFO level log in my logs. Hence why I believe it is > strange that I have info level logs, when I’ve set default_log_levels > like this: > > default_log_levels > = amqp=WARN,amqplib=WARN,boto=WARN,qpid=WARN,sqlalchemy=WARN,suds=WARN,oslo.messaging=WARN,iso8601=WARN,requests.packages.urllib3.connectionpool=WARN,urllib3.connectionpool=WARN,websocket=WARN,requests.packages.urllib3.util.retry=WARN,urllib3.util.retry=WARN,keystonemiddleware=WARN,routes.middleware=WARN,stevedore=WARN,taskflow=WARN,keystoneauth=WARN,oslo.cache=WARN Do you see any of the above modules logging with INFO level, though? Or are you just seeing other modules (e.g. nova.*) logging at INFO level? If you are only seeing nova modules logging at INFO level, try adding: ,nova=WARN to the default_log_levels CONF option. Let us know if this works :) Best, -jay From rafaelweingartner at gmail.com Wed May 8 13:49:59 2019 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 8 May 2019 10:49:59 -0300 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: Message-ID: Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation. On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: > Hi team, > > As planned, we will have a team meeting at 02:00 UTC, May 9th on > #openstack-telemetry to discuss what we gonna do for the next milestone > (Train-1) and continue what we left off from the last meeting. > > I put here [1] the agenda thinking that it should be fine for an hour > meeting. If you have anything to talk about, please put it there too. > > [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda > > > Bests, > > -- > *Trinh Nguyen* > *www.edlab.xyz * > > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 8 14:12:13 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 09:12:13 -0500 Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: References: <1557135206.12068.1@smtp.office365.com> Message-ID: <93df3b21-149c-d32b-54d0-614597d4d754@gmail.com> On 5/6/2019 10:49 AM, Chris Dent wrote: >>> Still nova might want to fix this placement data inconsistency. I >>> guess the new placement microversion will allow to update the consumer >>> type of an allocation. >> >> Yeah, I think this has to be updated from Nova. I (and I imagine others) >> would like to avoid making the type field optional in the API. So maybe >> default the value to something like "incomplete" or "unknown" and then >> let nova correct this naturally for instances on host startup and >> migrations on complete/revert. Ideally nova will be one one of the users >> that wants to depend on the type string, so we want to use our knowledge >> of which is which to get existing allocations updated so we can depend >> on the type value later. > > Ah, okay, good. If something like "unknown" is workable I think > that's much much better than defaulting to instance. Thanks. Yup I agree with everything said from a nova perspective. Our public cloud operators were just asking about leaked allocations and if there was tooling to report and clean that kind of stuff up. I explained we have the heal_allocations CLI but that's only going to create allocations for *instances* and only if those instances aren't deleted, but we don't have anything in nova that deals with detection and cleanup of leaked allocations, sort of like what this tooling does [1] but I think is different. So I was thinking about how we could write something in nova that reads the allocations from placement and checks to see if there is anything in there that doesn't match what we have for instances or migrations, i.e. the server was deleted but for whatever reason an allocation was leaked. To be able to determine what allocations are nova-specific today we'd have to guess based on the resource classes being used, namely VCPU and/or MEMORY_MB, but it of course gets more complicated once we start adding supported for nested allocations and such. So consumer type will help here, but we need it more than from the GET /usages API I think. If I were writing that kind of report/cleanup tool today, I'd probably want a GET /allocations API, but that might be too heavy (it would definitely require paging support I think). I could probably get by with using GET /resource_providers/{uuid}/allocations for each compute node we have in nova, but again that starts to get complicated with nested providers (what if the allocations are for VGPU?). Anyway, from a "it's better to have something than nothing at all" perspective it's probably easiest to just start with the easy thing and ask placement for allocations on all compute node providers and cross-check those consumers against what's in nova and if we find allocations that don't have a matching migration or instance we could optional delete them. [1] https://github.com/larsks/os-placement-tools -- Thanks, Matt From fungi at yuggoth.org Wed May 8 14:27:58 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 May 2019 14:27:58 +0000 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> Message-ID: <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> On 2019-05-07 22:50:21 +0200 (+0200), Dirk Müller wrote: > Am Di., 7. Mai 2019 um 22:30 Uhr schrieb Matthew Thode : > > > Pike - 2.18.2 -> 2.20.1 - https://review.opendev.org/640727 > > Queens - 2.18.4 -> 2.20.1 - https://review.opendev.org/640710 > > Specifically it looks like we're already at the next issue, as tracked here: > > https://github.com/kennethreitz/requests/issues/5065 > > Any concerns from anyone on these newer urllib3 updates? I guess we'll > do them a bit later though. It's still unclear to me why we're doing this at all. Our stable constraints lists are supposed to be a snapshot in time from when we released, modulo stable point release updates of the libraries we're maintaining. Agreeing to bump random dependencies on stable branches because of security vulnerabilities in them is a slippery slope toward our users expecting the project to be on top of vulnerability announcements for every one of the ~600 packages in our constraints list. Deployment projects already should not depend on our requirements team tracking security vulnerabilities, so need to have a mechanism to override constraints entries anyway if they're making such guarantees to their users (and I would also caution against doing that too). Distributions are far better equipped than our project to handle such tracking, as they generally get advance notice of vulnerabilities and selectively backport fixes for them. Trying to accomplish the same with a mix of old and new dependency versions in our increasingly aging stable and extended maintenance branches seems like a disaster waiting to happen. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From fungi at yuggoth.org Wed May 8 14:39:23 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 May 2019 14:39:23 +0000 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> On 2019-05-07 15:06:10 -0500 (-0500), Jay Bryant wrote: > Cinder has been working with the same unwritten rules for quite some time as > well with minimal issues. > > I think the concerns about not having it documented are warranted.  We have > had question about it in the past with no documentation to point to.  It is > more or less lore that has been passed down over the releases.  :-) > > At a minimum, having this e-mail thread is helpful.  If, however, we decide > to document it I think we should have it consistent across the teams that > use the rule.  I would be happy to help draft/review any such documentation. [...] I have a feeling that a big part of why it's gone undocumented for so long is that putting it in writing risks explicitly sending the message that we don't trust our contributors to act in the best interests of the project even if those are not aligned with the interests of their employer/sponsor. I think many of us attempt to avoid having all activity on a given patch come from people with the same funding affiliation so as to avoid giving the impression that any one organization is able to ram changes through with no oversight, but more because of the outward appearance than because we don't trust ourselves or our colleagues. Documenting our culture is a good thing, but embodying that documentation with this sort of nuance can be challenging. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From dangtrinhnt at gmail.com Wed May 8 14:41:07 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Wed, 8 May 2019 23:41:07 +0900 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: Message-ID: Hi Rafael, The meeting will be held on the IRC channel #openstack-telemetry as mentioned in the previous email. Thanks, On Wed, May 8, 2019 at 10:50 PM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Hello Trinh, > Where does the meeting happen? Will it be via IRC Telemetry channel? Or, > in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? > I would like to discuss and understand a bit better the context behind the Telemetry > events deprecation. > > On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen > wrote: > >> Hi team, >> >> As planned, we will have a team meeting at 02:00 UTC, May 9th on >> #openstack-telemetry to discuss what we gonna do for the next milestone >> (Train-1) and continue what we left off from the last meeting. >> >> I put here [1] the agenda thinking that it should be fine for an hour >> meeting. If you have anything to talk about, please put it there too. >> >> [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda >> >> >> Bests, >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz * >> >> > > -- > Rafael Weingärtner > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Wed May 8 14:43:41 2019 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Wed, 8 May 2019 10:43:41 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: References: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> Message-ID: <53BF2204-988C-4ED6-A687-F6188B90C547@planethoster.info> Hi, Indeed, the remaining info messages were coming from the nova-compute resource tracker. Adding nova=WARN in the list did remove these messages. Thank you very much for your help. Best regards, Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. > Le 8 mai 2019 à 09:35, Jay Pipes a écrit : > > Sorry for delayed response... comments inline. > > On 05/07/2019 05:31 PM, Jean-Philippe Méthot wrote: >> Indeed, this is what was written in your original response as well as in the documentation. As a result, it was fairly difficult to miss and I did comment it out before restarting the service. Additionally, as per the configuration I had set up, had the log-config-append option be set, I wouldn’t have any INFO level log in my logs. Hence why I believe it is strange that I have info level logs, when I’ve set default_log_levels like this: >> default_log_levels = amqp=WARN,amqplib=WARN,boto=WARN,qpid=WARN,sqlalchemy=WARN,suds=WARN,oslo.messaging=WARN,iso8601=WARN,requests.packages.urllib3.connectionpool=WARN,urllib3.connectionpool=WARN,websocket=WARN,requests.packages.urllib3.util.retry=WARN,urllib3.util.retry=WARN,keystonemiddleware=WARN,routes.middleware=WARN,stevedore=WARN,taskflow=WARN,keystoneauth=WARN,oslo.cache=WARN > > Do you see any of the above modules logging with INFO level, though? Or are you just seeing other modules (e.g. nova.*) logging at INFO level? > > If you are only seeing nova modules logging at INFO level, try adding: > > ,nova=WARN > > to the default_log_levels CONF option. > > Let us know if this works :) > > Best, > -jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From surya.seetharaman9 at gmail.com Wed May 8 14:47:14 2019 From: surya.seetharaman9 at gmail.com (Surya Seetharaman) Date: Wed, 8 May 2019 16:47:14 +0200 Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: References: <1557135206.12068.1@smtp.office365.com> Message-ID: On Mon, May 6, 2019 at 5:51 PM Chris Dent wrote: > On Mon, 6 May 2019, Dan Smith wrote: > > >> Still nova might want to fix this placement data inconsistency. I > >> guess the new placement microversion will allow to update the consumer > >> type of an allocation. > > > > Yeah, I think this has to be updated from Nova. I (and I imagine others) > > would like to avoid making the type field optional in the API. So maybe > > default the value to something like "incomplete" or "unknown" and then > > let nova correct this naturally for instances on host startup and > > migrations on complete/revert. Ideally nova will be one one of the users > > that wants to depend on the type string, so we want to use our knowledge > > of which is which to get existing allocations updated so we can depend > > on the type value later. > > Ah, okay, good. If something like "unknown" is workable I think > that's much much better than defaulting to instance. Thanks. > okay, the spec will take this approach then. Regards, Surya. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dtantsur at redhat.com Wed May 8 14:56:06 2019 From: dtantsur at redhat.com (Dmitry Tantsur) Date: Wed, 8 May 2019 16:56:06 +0200 Subject: [ironic][tripleo] My PTG & Forum notes In-Reply-To: <896f2331-139d-acfe-5115-248411eb6b35@redhat.com> References: <7313c6aa-1693-2cb0-4ed9-a73646764070@redhat.com> <896f2331-139d-acfe-5115-248411eb6b35@redhat.com> Message-ID: <80510197-88fe-e4c0-6cd8-d68e2b38e28c@redhat.com> On 5/8/19 11:18 AM, Bogdan Dobrelya wrote: > On 07.05.2019 19:47, Dmitry Tantsur wrote: >> Hi folks, >> >> I've published my personal notes from the PTG & Forum in Denver: >> https://dtantsur.github.io/posts/ironic-denver-2019/ >> They're probably opinionated and definitely not complete, but I still think >> they could be useful. >> >> Also pasting the whole raw RST text below for ease of commenting. >> >> Cheers, >> Dmitry >> >> >> Keynotes >> ======== >> >> The `Metal3`_ project got some spotlight during the keynotes. A (successful!) >> `live demo`_ was done that demonstrated using Ironic through Kubernetes API to >> drive provisioning of bare metal nodes. > > this is very interesting to consider for TripleO integration alongside (or > alternatively?) standalone Ironic, see my note below > >> >> The official `bare metal program`_ was announced to promote managing bare metal >> infrastructure via OpenStack. >> >> >> PTG: TripleO >> ============ >> >> We discussed our plans for removing Nova from the TripleO undercloud and >> moving bare metal provisioning from under control of Heat. The plan from the > > I wish we could have Metal3 provisioning via K8s API adapted for Undercloud in > TripleO. Probably via a) standalone kubelet or b) k3s [0]. > The former provides only kubelet running static pods, no API server et al. The > latter is a lightweight k8s distro (a 10MB memory footprint or so) and may be as > well used to spawn some very limited kubelet and API server setup for Metal3 to > drive the provisioning of overclouds outside of Heat and Neutron. We could use Metal3, but it will definitely change user experience beyond the point of recognition and rule out upgrades. With the current effort we're trying to keep the user interactions similar and upgrades still possible. Dmitry > > [0] > https://www.cnrancher.com/blog/2019/2019-02-26-introducing-k3s-the-lightweight-kubernetes-distribution-built-for-the-edge/ > > >> `nova-less-deploy specification`_, as well as the current state >> of the implementation, were presented. >> >> The current concerns are: >> >> * upgrades from a Nova based deployment (probably just wipe the Nova >>    database), >> * losing user experience of ``nova list`` (largely compensated by >>    ``metalsmith list``), >> * tracking IP addresses for networks other than *ctlplane* (solved the same >>    way as for deployed servers). >> >> The next action item is to create a CI job based on the already merged code and >> verify a few assumptions made above. >> From massimo.sgaravatto at gmail.com Wed May 8 15:04:10 2019 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Wed, 8 May 2019 17:04:10 +0200 Subject: [nova][ops] 'Duplicate entry for primary key' problem running nova-manage db archive_deleted_rows In-Reply-To: References: Message-ID: The problem is not for that single entry Looks like the auto_increment for that table was reset (I don't know when-how) Cheers, Massimo On Wed, May 8, 2019 at 1:50 PM Surya Seetharaman < surya.seetharaman9 at gmail.com> wrote: > Hi, > > On Wed, May 8, 2019 at 12:41 PM Massimo Sgaravatto < > massimo.sgaravatto at gmail.com> wrote: > >> Hi >> >> Fron time to time I use to move entries related to deleted instances to >> shadow tables, using the command: >> >> nova-manage db archive_deleted_rows >> >> This is now failing [*] for the instance_metadata table because of a >> 'duplicate entry for the primary key' problem: >> >> DBDuplicateEntry: (pymysql.err.IntegrityError) (1062, u"Duplicate entry >> '6' for key 'PRIMARY'") [SQL: u'INSERT INTO shadow_instance_metadata >> (created_at, updated_at, deleted_at, deleted, id, `key`, value, >> instance_uuid) SELECT instance_metadata.created_at, >> instance_metadata.updated_at, instance_metadata.deleted_at, >> instance_metadata.deleted, instance_metadata.id, >> instance_metadata.`key`, instance_metadata.value, >> instance_metadata.instance_uuid \nFROM instance_metadata \nWHERE >> instance_metadata.deleted != %(deleted_1)s ORDER BY instance_metadata.id >> \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'deleted_1': 0}] >> >> >> Indeed: >> >> mysql> SELECT instance_metadata.created_at, instance_metadata.updated_at, >> instance_metadata.deleted_at, instance_metadata.deleted, >> instance_metadata.id, instance_metadata.`key`, instance_metadata.value, >> instance_metadata.instance_uuid FROM instance_metadata WHERE >> instance_metadata.deleted != 0 ORDER BY instance_metadata.id limit 1; >> >> +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ >> | created_at | updated_at | deleted_at | deleted | id | >> key | value | instance_uuid | >> >> +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ >> | 2018-09-20 07:40:56 | NULL | 2018-09-20 07:54:26 | 6 | 6 | >> group | node | a9000ff7-2298-454c-bf71-9e3c62ec0f3c | >> >> +---------------------+------------+---------------------+---------+----+-------+-------+--------------------------------------+ >> 1 row in set (0.00 sec) >> >> >> But there is a 5-years old entry (if I am not wrong we were running >> Havana at that time) in the shadow table with that id: >> >> mysql> select * from shadow_instance_metadata where id='6'; >> >> +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ >> | created_at | updated_at | deleted_at | id | key >> | value | instance_uuid | deleted | >> >> +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ >> | 2014-11-04 12:57:10 | NULL | 2014-11-04 13:06:45 | 6 | director >> | microbosh-openstack | 5db5b17b-69f2-4f0a-bdd2-efe710268021 | 6 | >> >> +---------------------+------------+---------------------+----+----------+---------------------+--------------------------------------+---------+ >> 1 row in set (0.00 sec) >> >> mysql> >> >> >> I wonder how could that happen. >> >> Can I simply remove that entry from the shadow table (I am not really >> interested to keep it) or are there better (cleaner) way to fix the problem >> ? >> >> >> This Cloud is now running Ocata >> >> Thanks, Massimo >> >> > From what I can understand, it looks like a record with id 6 was archived > long back (havana-ish) and then there was a new record with id 6 again > ready to be archived ? (not sure how there could have been two records with > same id since ids are incremental even over releases, I am not sure of the > history though since I wasn't involved with OS then). I think the only way > out is to manually delete that entry from the shadow table if you don't > want it. There should be no harm in removing it. > > We have a "nova-manage db purge [--all] [--before ] [--verbose] > [--all-cells]" command that removes records from shadow_tables ( > https://docs.openstack.org/nova/rocky/cli/nova-manage.html) but it was > introduced in rocky. So it won't be available in Ocata unfortunately. > > Cheers, > Surya. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From waboring at hemna.com Wed May 8 15:04:59 2019 From: waboring at hemna.com (Walter Boring) Date: Wed, 8 May 2019 11:04:59 -0400 Subject: [cinder] Python3 requirements for Train Message-ID: Hello Cinder folks, The train release is going to be the last release of OpenStack with python 2 support. Train also is going to require supporting python 3.6 and 3.7. This means that we should be enabling and or switching over all of our 3rd party CI runs to python 3 to ensure that our drivers and all of their required libraries run properly in a python 3.6/3.7 environment. This will help driver maintainers discover any python3 incompatibilities with their driver as well as any required libraries. At the PTG in Denver, the cinder team agreed that we wanted driver CI systems to start using python3 by milestone 2 for Train. This would be the July 22-26th time frame [1]. We are also working on adding driver library requirements to the OpenStack global requirements project [2] [3]. This effort will provide native install primitives for driver libraries in cinder. This process also requires the driver libraries to run in python3.6/3.7. The Cinder team wants to maintain it's high quality of driver support in the train release. By enabling python 3.6 and python 3.7 in CI tests, this will help everyone ship Cinder with the required support in Train and the following releases. Walt [1] https://releases.openstack.org/train/schedule.html [2] https://review.opendev.org/#/c/656724/ [3] https://review.opendev.org/#/c/657395/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From rleander at redhat.com Thu May 2 04:35:10 2019 From: rleander at redhat.com (Rain Leander) Date: Wed, 1 May 2019 22:35:10 -0600 Subject: [ptg] Interviews at PTG Denver Message-ID: Hello all! I'm attending PTG this week to conduct project interviews [0]. These interviews have several purposes. Please consider all of the following when thinking about what you might want to say in your interview: * Tell the users/customers/press what you've been working on in Rocky * Give them some idea of what's (what might be?) coming in Stein * Put a human face on the OpenStack project and encourage new participants to join us * You're welcome to promote your company's involvement in OpenStack but we ask that you avoid any kind of product pitches or job recruitment In the interview I'll ask some leading questions and it'll go easier if you've given some thought to them ahead of time: * Who are you? (Your name, your employer, and the project(s) on which you are active.) * What did you accomplish in Rocky? (Focus on the 2-3 things that will be most interesting to cloud operators) * What do you expect to be the focus in Stein? (At the time of your interview, it's likely that the meetings will not yet have decided anything firm. That's ok.) * Anything further about the project(s) you work on or the OpenStack community in general. Finally, note that there are only 40 interview slots available, so please consider coordinating with your project to designate the people that you want to represent the project, so that we don't end up with 12 interview about Neutron, or whatever. I mean, love me some Neutron, but twelve interviews is a bit too many, eh? It's fine to have multiple people in one interview - Maximum 3, probably. Interview slots are 30 minutes, in which time we hope to capture somewhere between 10 and 20 minutes of content. It's fine to run shorter but 15 minutes is probably an ideal length. See you SOON! [0] https://docs.google.com/spreadsheets/d/1xZosqEL_iRI1Q-A5j-guRVh6Gc8-rRQ6SqKvHZHHxBg/edit?usp=sharing -- K Rain Leander OpenStack Community Liaison Open Source and Standards Team https://www.rdoproject.org/ http://community.redhat.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sneha.rai at hpe.com Thu May 2 08:07:18 2019 From: sneha.rai at hpe.com (RAI, SNEHA) Date: Thu, 2 May 2019 08:07:18 +0000 Subject: Help needed to Support Multi-attach feature Message-ID: Hi Team, I am currently working on multiattach feature for HPE 3PAR cinder driver. For this, while setting up devstack(on stable/queens) I made below change in the local.conf [[local|localrc]] ENABLE_VOLUME_MULTIATTACH=True ENABLE_UBUNTU_CLOUD_ARCHIVE=False /etc/cinder/cinder.conf: [3pariscsi_1] hpe3par_api_url = https://192.168.1.7:8080/api/v1 hpe3par_username = user hpe3par_password = password san_ip = 192.168.1.7 san_login = user san_password = password volume_backend_name = 3pariscsi_1 hpe3par_cpg = my_cpg hpe3par_iscsi_ips = 192.168.11.2,192.168.11.3 volume_driver = cinder.volume.drivers.hpe.hpe_3par_iscsi.HPE3PARISCSIDriver hpe3par_iscsi_chap_enabled = True hpe3par_debug = True image_volume_cache_enabled = True /etc/cinder/policy.json: 'volume:multiattach': 'rule:admin_or_owner' Added https://review.opendev.org/#/c/560067/2/cinder/volume/drivers/hpe/hpe_3par_common.py change in the code. But I am getting below error in the nova log: Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [None req-2cda6e90-fd45-4bfe-960a-7fca9ba4abab demo admin] [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] Instance failed block device setup: MultiattachNotSupportedByVirtDriver: Volume dc25f09a-6ae1-4b06-a814-73a8afaba62f has 'multiattach' set, which is not supported for this instance. Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] Traceback (most recent call last): Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/compute/manager.py", line 1615, in _prep_block_device Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] wait_func=self._await_block_device_map_created) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 840, in attach_block_devices Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] _log_and_attach(device) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 837, in _log_and_attach Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] bdm.attach(*attach_args, **attach_kwargs) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 46, in wrapped Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] ret_val = method(obj, context, *args, **kwargs) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 620, in attach Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] virt_driver, do_driver_attach) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 274, in inner Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] return f(*args, **kwargs) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 617, in _do_locked_attach Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] self._do_attach(*args, **_kwargs) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 602, in _do_attach Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] do_driver_attach) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] File "/opt/stack/nova/nova/virt/block_device.py", line 509, in _volume_attach Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] volume_id=volume_id) Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] MultiattachNotSupportedByVirtDriver: Volume dc25f09a-6ae1-4b06-a814-73a8afaba62f has 'multiattach' set, which is not supported for this instance. Apr 29 04:23:04 CSSOSBE04-B09 nova-compute[31396]: ERROR nova.compute.manager [instance: fcaa5a47-fc48-489d-9827-6533bfd1a9fa] Apr 29 05:41:20 CSSOSBE04-B09 nova-compute[20455]: DEBUG nova.virt.libvirt.driver [-] Volume multiattach is not supported based on current versions of QEMU and libvirt. QEMU must be less than 2.10 or libvirt must be greater than or equal to 3.10. {{(pid=20455) _set_multiattach_support /opt/stack/nova/nova/virt/libvirt/driver.py:619}} stack at CSSOSBE04-B09:/tmp$ virsh --version 3.6.0 stack at CSSOSBE04-B09:/tmp$ kvm --version QEMU emulator version 2.10.1(Debian 1:2.10+dfsg-0ubuntu3.8~cloud1) Copyright (c) 2003-2017 Fabrice Bellard and the QEMU Project developers openstack volume show -c multiattach -c status sneha1 +-------------+-----------+ | Field | Value | +-------------+-----------+ | multiattach | True | | status | available | +-------------+-----------+ cinder extra-specs-list +--------------------------------------+-------------+--------------------------------------------------------------------+ | ID | Name | extra_specs | +--------------------------------------+-------------+--------------------------------------------------------------------+ | bd077fde-51c3-4581-80d5-5855e8ab2f6b | 3pariscsi_1 | {'volume_backend_name': '3pariscsi_1', 'multiattach': ' True'}| +--------------------------------------+-------------+--------------------------------------------------------------------+ echo $OS_COMPUTE_API_VERSION 2.60 pip list | grep python-novaclient DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. python-novaclient 13.0.0 How do I fix this version issue on my setup to proceed? Please help. Thanks & Regards, Sneha Rai -------------- next part -------------- An HTML attachment was scrubbed... URL: From gn01737625 at gmail.com Thu May 2 09:13:35 2019 From: gn01737625 at gmail.com (Ming-Che Liu) Date: Thu, 2 May 2019 17:13:35 +0800 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. [kolla] In-Reply-To: <10f217bf-33a2-d40a-8bcf-6994c26be699@stackhpc.com> References: <10f217bf-33a2-d40a-8bcf-6994c26be699@stackhpc.com> Message-ID: Hello, Thank you for replying, my goal is to deploy [all-in-one] openstack+monasca(in the same physical machine/VM). I will check the detail error information and provide such logs for you, thank you. I also have a question about kolla-ansible 8.0.0.0rc1, when I check the new feature about kolla-ansible 8.0.0.0rc1, it seems only 8.0.0.0rc1 provide the "complete" monasca functionality, it that right(that means you can see monasca's plugin in openstack horizon, as the following picture)? Thank you very much. Regards, Shawn [image: monasca.png] Doug Szumski 於 2019年5月2日 週四 下午4:21寫道: > > On 01/05/2019 08:45, Ming-Che Liu wrote: > > Hello, > > > > I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. > > It doesn't look like Monasca is enabled in your globals.yml file. Are > you trying to set up OpenStack services first and then enable Monasca > afterwards? You can also deploy Monasca standalone if that is useful: > > > https://docs.openstack.org/kolla-ansible/latest/reference/logging-and-monitoring/monasca-guide.html > > > > > I follow the steps as mentioned in > > https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html > > > > The setting in my computer's globals.yml as same as [Quick Start] > > tutorial (attached file: globals.yml is my setting). > > > > My machine environment as following: > > OS: Ubuntu 16.04 > > Kolla-ansible verions: 8.0.0.0rc1 > > ansible version: 2.7 > > > > When I execute [bootstrap-servers] and [prechecks], it seems ok (no > > fatal error or any interrupt). > > > > But when I execute [deploy], it will occur some error about > > rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when > > I set enable_rabbitmq:no). > > > > I have some detail screenshot about the errors as attached files, > > could you please help me to solve this problem? > > Please can you post more information on why the containers are not > starting. > > - Inspect rabbit and nova-compute logs (in > /var/lib/docker/volumes/kolla_logs/_data/) > > - Check relevant containers are running, and if they are restarting > check the output. Eg. docker logs --follow nova_compute > > > > > Thank you very much. > > > > [Attached file description]: > > globals.yml: my computer's setting about kolla-ansible > > > > As mentioned above, the following pictures show the errors, the > > rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova > > compute service error will occur if I set [enable_rabbitmq:no]. > > docker-version.png > > kolla-ansible-version.png > > nova-compute-service-error.png > > rabbitmq_error.png > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: monasca.png Type: image/png Size: 18872 bytes Desc: not available URL: From shyambiradarsggsit at gmail.com Thu May 2 15:59:56 2019 From: shyambiradarsggsit at gmail.com (Shyam Biradar) Date: Thu, 2 May 2019 21:29:56 +0530 Subject: Kolla-ansible pike nova compute keeps restarting Message-ID: Hi, I am setting up all-in-one ubuntu based kolla-ansible pike openstack. Deployment is failing at following ansible task: TASK [nova : include_tasks] ****************************** ************************************************************ **************************** included: /root/virtnev/share/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml for localhost TASK [nova : Waiting for nova-compute service up] ************************************************************ ************************************ FAILED - RETRYING: Waiting for nova-compute service up (20 retries left). FAILED - RETRYING: Waiting for nova-compute service up (19 retries left). FAILED - RETRYING: Waiting for nova-compute service up (18 retries left). FAILED - RETRYING: Waiting for nova-compute service up (17 retries left). FAILED - RETRYING: Waiting for nova-compute service up (16 retries left). FAILED - RETRYING: Waiting for nova-compute service up (15 retries left). FAILED - RETRYING: Waiting for nova-compute service up (14 retries left). FAILED - RETRYING: Waiting for nova-compute service up (13 retries left). FAILED - RETRYING: Waiting for nova-compute service up (12 retries left). FAILED - RETRYING: Waiting for nova-compute service up (11 retries left). FAILED - RETRYING: Waiting for nova-compute service up (10 retries left). FAILED - RETRYING: Waiting for nova-compute service up (9 retries left). FAILED - RETRYING: Waiting for nova-compute service up (8 retries left). FAILED - RETRYING: Waiting for nova-compute service up (7 retries left). FAILED - RETRYING: Waiting for nova-compute service up (6 retries left). FAILED - RETRYING: Waiting for nova-compute service up (5 retries left). FAILED - RETRYING: Waiting for nova-compute service up (4 retries left). FAILED - RETRYING: Waiting for nova-compute service up (3 retries left). FAILED - RETRYING: Waiting for nova-compute service up (2 retries left). FAILED - RETRYING: Waiting for nova-compute service up (1 retries left). fatal: [localhost -> localhost]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://192.168.122.151:35357 ", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", " ivpu1km8qxnVQESvAF4cyTFstOvrbxGUHjFF15gZ", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:02.555356", "end": "2019-05-02 09:24:45.485786", "rc": 0, "start": "2019-05-02 09:24:42.930430", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]} -------------------------------------------------------------------- I can see following stack trace in nova-compute container log 4. 2019-05-02 08:21:30.522 7 INFO nova.service [-] Starting compute node (version 16.1.7) 2019-05-02 08:21:30.524 7 ERROR oslo_service.service [-] Error starting thread.: PlacementNotConfigured: This compute is not configured to talk to the placement service. Configure the [placement] section of nova.conf and restart the service. 2019-05-02 08:21:30.524 7 ERROR oslo_service.service Traceback (most recent call last): 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_service/service.py", line 721, in run_service 2019-05-02 08:21:30.524 7 ERROR oslo_service.service service.start() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/service.py", line 156, in start 2019-05-02 08:21:30.524 7 ERROR oslo_service.service self.manager.init_host() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1155, in init_host 2019-05-02 08:21:30.524 7 ERROR oslo_service.service raise exception. PlacementNotConfigured() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service PlacementNotConfigured: This compute is not configured to talk to the placement service. Configure the [placement] section of nova.conf and restart the service. 2019-05-02 08:21:30.524 7 ERROR oslo_service.service 2019-05-02 08:21:59.229 7 INFO os_vif [-] Loaded VIF plugins: ovs, linux_bridge --------------------------------------------------------------------- I saw nova-compute nova.conf has [placement] section configured well and it's same as nova_api's placement section. Other nova containers are started well. Any thoughts? Best Regards, Shyam Biradar Email Id: shyambiradarsggsit at gmail.com Contact: 8600266938 -------------- next part -------------- An HTML attachment was scrubbed... URL: From gn01737625 at gmail.com Fri May 3 01:22:00 2019 From: gn01737625 at gmail.com (Ming-Che Liu) Date: Fri, 3 May 2019 09:22:00 +0800 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. In-Reply-To: References: Message-ID: Hi Mark, Sure, I will do that, thanks. Regards, Ming-Che Mark Goddard 於 2019年5月3日 週五 上午1:12寫道: > > > On Wed, 1 May 2019 at 17:10, Ming-Che Liu wrote: > >> Hello, >> >> I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. >> >> I follow the steps as mentioned in >> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >> >> The setting in my computer's globals.yml as same as [Quick Start] >> tutorial (attached file: globals.yml is my setting). >> >> My machine environment as following: >> OS: Ubuntu 16.04 >> Kolla-ansible verions: 8.0.0.0rc1 >> ansible version: 2.7 >> >> When I execute [bootstrap-servers] and [prechecks], it seems ok (no fatal >> error or any interrupt). >> >> But when I execute [deploy], it will occur some error about rabbitmq(when >> I set enable_rabbitmq:yes) and nova compute service(when I >> set enable_rabbitmq:no). >> >> I have some detail screenshot about the errors as attached files, could >> you please help me to solve this problem? >> >> Thank you very much. >> >> [Attached file description]: >> globals.yml: my computer's setting about kolla-ansible >> >> As mentioned above, the following pictures show the errors, the rabbitmq >> error will occur if I set [enable_rabbitmq:yes], the nova compute service >> error will occur if I set [enable_rabbitmq:no]. >> > > Hi Ming-Che, > > Since Stein, we no longer test Kolla Ansible with Ubuntu 16.04 upstream. > Could you try again using Ubuntu 18.04? > > Regards, > Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gn01737625 at gmail.com Fri May 3 07:26:28 2019 From: gn01737625 at gmail.com (Ming-Che Liu) Date: Fri, 3 May 2019 15:26:28 +0800 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. In-Reply-To: References: Message-ID: Hi Mark, I tried to deploy openstack+monasca with kolla-ansible 8.0.0.0rc1(in the same machine), but still encounter some fatal error. The attached file:golbals.yml is my setting, machine_package_setting is machine environment setting. The error is: RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] ************************************************************ fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", "delta": "0:00:00.861054", "end": "2019-05-03 15:17:42.387873", "msg": "non-zero return code", "rc": 137, "start": "2019-05-03 15:17:41.526819", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []} When I use command "docker inspect rabbitmq_id |grep RestartCount", I find rabbitmq will restart many times such as: kaga at agre-an21:~$ sudo docker inspect 5567f37cc78a |grep RestartCount "RestartCount": 15, Could please help to solve this problem? Thanks. Regards, Ming-Che Ming-Che Liu 於 2019年5月3日 週五 上午9:22寫道: > Hi Mark, > > Sure, I will do that, thanks. > > Regards, > > Ming-Che > > Mark Goddard 於 2019年5月3日 週五 上午1:12寫道: > >> >> >> On Wed, 1 May 2019 at 17:10, Ming-Che Liu wrote: >> >>> Hello, >>> >>> I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. >>> >>> I follow the steps as mentioned in >>> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >>> >>> The setting in my computer's globals.yml as same as [Quick Start] >>> tutorial (attached file: globals.yml is my setting). >>> >>> My machine environment as following: >>> OS: Ubuntu 16.04 >>> Kolla-ansible verions: 8.0.0.0rc1 >>> ansible version: 2.7 >>> >>> When I execute [bootstrap-servers] and [prechecks], it seems ok (no >>> fatal error or any interrupt). >>> >>> But when I execute [deploy], it will occur some error about >>> rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when I >>> set enable_rabbitmq:no). >>> >>> I have some detail screenshot about the errors as attached files, could >>> you please help me to solve this problem? >>> >>> Thank you very much. >>> >>> [Attached file description]: >>> globals.yml: my computer's setting about kolla-ansible >>> >>> As mentioned above, the following pictures show the errors, the rabbitmq >>> error will occur if I set [enable_rabbitmq:yes], the nova compute service >>> error will occur if I set [enable_rabbitmq:no]. >>> >> >> Hi Ming-Che, >> >> Since Stein, we no longer test Kolla Ansible with Ubuntu 16.04 upstream. >> Could you try again using Ubuntu 18.04? >> >> Regards, >> Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: machine_package_setting Type: application/octet-stream Size: 1885 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: globals.yml Type: application/x-yaml Size: 20184 bytes Desc: not available URL: From gn01737625 at gmail.com Fri May 3 08:13:26 2019 From: gn01737625 at gmail.com (Ming-Che Liu) Date: Fri, 3 May 2019 16:13:26 +0800 Subject: [Deploy problem] deploy openstack+monasca with kolla-ansible 8.0.0.0rc1. In-Reply-To: References: Message-ID: Apologies,this mail will attach rabbitmq log file(ues command "docker logs --follow rabbitmq") for debug. Logs in /var/lib/docker/volumes/kolla_logs/_data/rabbitmq are empty. thanks. Regards, Ming-Che Ming-Che Liu 於 2019年5月3日 週五 下午3:26寫道: > Hi Mark, > > I tried to deploy openstack+monasca with kolla-ansible 8.0.0.0rc1(in the > same machine), but still encounter some fatal error. > > The attached file:golbals.yml is my setting, machine_package_setting is > machine environment setting. > > The error is: > RUNNING HANDLER [rabbitmq : Waiting for rabbitmq to start on first node] > ************************************************************ > fatal: [localhost]: FAILED! => {"changed": true, "cmd": "docker exec > rabbitmq rabbitmqctl wait /var/lib/rabbitmq/mnesia/rabbitmq.pid", "delta": > "0:00:00.861054", "end": "2019-05-03 15:17:42.387873", "msg": "non-zero > return code", "rc": 137, "start": "2019-05-03 15:17:41.526819", "stderr": > "", "stderr_lines": [], "stdout": "", "stdout_lines": []} > > When I use command "docker inspect rabbitmq_id |grep RestartCount", I > find rabbitmq will restart many times > > such as: > > kaga at agre-an21:~$ sudo docker inspect 5567f37cc78a |grep RestartCount > "RestartCount": 15, > > Could please help to solve this problem? Thanks. > > Regards, > > Ming-Che > > > > > > > > Ming-Che Liu 於 2019年5月3日 週五 上午9:22寫道: > >> Hi Mark, >> >> Sure, I will do that, thanks. >> >> Regards, >> >> Ming-Che >> >> Mark Goddard 於 2019年5月3日 週五 上午1:12寫道: >> >>> >>> >>> On Wed, 1 May 2019 at 17:10, Ming-Che Liu wrote: >>> >>>> Hello, >>>> >>>> I deployed openstack+monasca with kolla-ansible 8.0.0.0rc1. >>>> >>>> I follow the steps as mentioned in >>>> https://docs.openstack.org/kolla-ansible/latest/user/quickstart.html >>>> >>>> The setting in my computer's globals.yml as same as [Quick Start] >>>> tutorial (attached file: globals.yml is my setting). >>>> >>>> My machine environment as following: >>>> OS: Ubuntu 16.04 >>>> Kolla-ansible verions: 8.0.0.0rc1 >>>> ansible version: 2.7 >>>> >>>> When I execute [bootstrap-servers] and [prechecks], it seems ok (no >>>> fatal error or any interrupt). >>>> >>>> But when I execute [deploy], it will occur some error about >>>> rabbitmq(when I set enable_rabbitmq:yes) and nova compute service(when I >>>> set enable_rabbitmq:no). >>>> >>>> I have some detail screenshot about the errors as attached files, could >>>> you please help me to solve this problem? >>>> >>>> Thank you very much. >>>> >>>> [Attached file description]: >>>> globals.yml: my computer's setting about kolla-ansible >>>> >>>> As mentioned above, the following pictures show the errors, the >>>> rabbitmq error will occur if I set [enable_rabbitmq:yes], the nova compute >>>> service error will occur if I set [enable_rabbitmq:no]. >>>> >>> >>> Hi Ming-Che, >>> >>> Since Stein, we no longer test Kolla Ansible with Ubuntu 16.04 upstream. >>> Could you try again using Ubuntu 18.04? >>> >>> Regards, >>> Mark >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: rabbitmq_docker_log Type: application/octet-stream Size: 340347 bytes Desc: not available URL: From yadav.akshay58 at gmail.com Fri May 3 11:01:36 2019 From: yadav.akshay58 at gmail.com (Akki yadav) Date: Fri, 3 May 2019 16:31:36 +0530 Subject: [Neutron] Can I create VM on flat network which doesnt have any subnet attached to it. Message-ID: Hello Team, Hope you all are doing good. I wanted to know that can I launch a VM on a flat network directly which doesn't have any subnet attached to it. Steps to be followed: Create a flat Network without a subnet Attach the network to create a VM. Aim :- Spawn 2 VM's on network without any subnet, without any IP assigned to them. Then statically allocate same subnet IP to them and ping each other. Issue:- VM creation is getting failed stating that there is no subnet found. How can we resolve this? Thanks & Regards Akshay -------------- next part -------------- An HTML attachment was scrubbed... URL: From manuel.sb at garvan.org.au Thu May 2 03:16:13 2019 From: manuel.sb at garvan.org.au (Manuel Sopena Ballesteros) Date: Thu, 2 May 2019 03:16:13 +0000 Subject: how to setup nvme pci-pasthrough to get close to native performance? Message-ID: <9D8A2486E35F0941A60430473E29F15B017EA662EB@mxdb2.ad.garvan.unsw.edu.au> Dear Openstack community, I am configuring a high performance storage vms, I decided to go to the easy path (pci-passthrough), I can spin up vms and see the pci devices, however performance is below native/bare metal. Native/Bare metal performance: [root at zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May 1 22:22:45 2019 read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec) slat (usec): min=39, max=6678, avg=94.72, stdev=55.78 clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10 lat (usec): min=39, max=6679, avg=95.36, stdev=55.79 clat percentiles (nsec): | 1.00th=[ 462], 5.00th=[ 478], 10.00th=[ 482], 20.00th=[ 486], | 30.00th=[ 490], 40.00th=[ 494], 50.00th=[ 502], 60.00th=[ 510], | 70.00th=[ 516], 80.00th=[ 532], 90.00th=[ 596], 95.00th=[ 676], | 99.00th=[ 860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480], | 99.99th=[ 3728] bw ( KiB/s): min= 720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239 iops : min= 180, max=10184, avg=9847.23, stdev=1329.39, samples=239 write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec) slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04 clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71 lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03 clat percentiles (nsec): | 1.00th=[ 414], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 430], | 30.00th=[ 434], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 438], | 70.00th=[ 442], 80.00th=[ 446], 90.00th=[ 462], 95.00th=[ 588], | 99.00th=[ 700], 99.50th=[ 916], 99.90th=[ 1208], 99.95th=[ 1288], | 99.99th=[ 3536] bw ( KiB/s): min= 752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239 iops : min= 188, max=10652, avg=9841.64, stdev=1338.93, samples=239 lat (nsec) : 500=69.98%, 750=28.64%, 1000=0.90% lat (usec) : 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01% cpu : usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec Disk stats (read/write): nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28% VM performance: [centos at kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May 1 12:22:24 2019 read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec) slat (usec): min=54, max=20476, avg=115.27, stdev=71.45 clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66 lat (usec): min=56, max=20481, avg=118.51, stdev=71.66 clat percentiles (nsec): | 1.00th=[ 1800], 5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992], | 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096], | 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544], | 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736], | 99.99th=[18560] bw ( KiB/s): min= 952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237 iops : min= 238, max= 7806, avg=7038.23, stdev=1031.70, samples=237 write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec) slat (usec): min=7, max=963, avg=12.60, stdev= 6.24 clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33 lat (usec): min=10, max=970, avg=15.68, stdev= 6.48 clat percentiles (nsec): | 1.00th=[ 1688], 5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864], | 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960], | 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384], | 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120], | 99.99th=[19072] bw ( KiB/s): min= 912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237 iops : min= 228, max= 7970, avg=7029.75, stdev=1044.07, samples=237 lat (usec) : 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01% lat (usec) : 250=0.01% cpu : usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec Disk stats (read/write): nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18% This is my Openstack rocky configuration: nova.conf on controller node [pci] alias = { "vendor_id":"10de", "product_id":"1db1", "device_type":"type-PCI", "name":"nv_v100" } alias = { "vendor_id":"8086", "product_id":"0953", "device_type":"type-PCI", "name":"nvme"} nova.conf on compute node: [pci] passthrough_whitelist = [ {"address":"0000:84:00.0"}, {"address":"0000:85:00.0"}, {"address":"0000:86:00.0"}, {"address":"0000:87:00.0"} ] alias = { "vendor_id":"8086", "product_id":"0953", "device_type":"type-PCI", "name":"nvme"} This is how the nvmes are exposed to the vm
Guest OS is centos 7.6 so I am guessing nvme drivers are included. Any help about what needs to my configuration to get close to native io performance? Thank you very much Manuel From: Manuel Sopena Ballesteros [mailto:manuel.sb at garvan.org.au] Sent: Wednesday, May 1, 2019 10:31 PM To: openstack-discuss at lists.openstack.org Subject: how to get best io performance from my block devices Dear Openstack community, I would like to have a high performance distributed database running in Openstack vms. I tried attaching dedicated nvme pci devices to the vm but the performance is not as good as I can get from bare metal. Bare metal: [root at zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May 1 22:22:45 2019 read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec) slat (usec): min=39, max=6678, avg=94.72, stdev=55.78 clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10 lat (usec): min=39, max=6679, avg=95.36, stdev=55.79 clat percentiles (nsec): | 1.00th=[ 462], 5.00th=[ 478], 10.00th=[ 482], 20.00th=[ 486], | 30.00th=[ 490], 40.00th=[ 494], 50.00th=[ 502], 60.00th=[ 510], | 70.00th=[ 516], 80.00th=[ 532], 90.00th=[ 596], 95.00th=[ 676], | 99.00th=[ 860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480], | 99.99th=[ 3728] bw ( KiB/s): min= 720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239 iops : min= 180, max=10184, avg=9847.23, stdev=1329.39, samples=239 write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec) slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04 clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71 lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03 clat percentiles (nsec): | 1.00th=[ 414], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 430], | 30.00th=[ 434], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 438], | 70.00th=[ 442], 80.00th=[ 446], 90.00th=[ 462], 95.00th=[ 588], | 99.00th=[ 700], 99.50th=[ 916], 99.90th=[ 1208], 99.95th=[ 1288], | 99.99th=[ 3536] bw ( KiB/s): min= 752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239 iops : min= 188, max=10652, avg=9841.64, stdev=1338.93, samples=239 lat (nsec) : 500=69.98%, 750=28.64%, 1000=0.90% lat (usec) : 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01% cpu : usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec Disk stats (read/write): nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28% >From vm: [centos at kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May 1 12:22:24 2019 read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec) slat (usec): min=54, max=20476, avg=115.27, stdev=71.45 clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66 lat (usec): min=56, max=20481, avg=118.51, stdev=71.66 clat percentiles (nsec): | 1.00th=[ 1800], 5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992], | 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096], | 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544], | 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736], | 99.99th=[18560] bw ( KiB/s): min= 952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237 iops : min= 238, max= 7806, avg=7038.23, stdev=1031.70, samples=237 write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec) slat (usec): min=7, max=963, avg=12.60, stdev= 6.24 clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33 lat (usec): min=10, max=970, avg=15.68, stdev= 6.48 clat percentiles (nsec): | 1.00th=[ 1688], 5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864], | 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960], | 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384], | 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120], | 99.99th=[19072] bw ( KiB/s): min= 912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237 iops : min= 228, max= 7970, avg=7029.75, stdev=1044.07, samples=237 lat (usec) : 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01% lat (usec) : 250=0.01% cpu : usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec Disk stats (read/write): nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18% Is there a way I can get near bare metal performance from my nvme block devices? NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Bell at cern.ch Thu May 2 13:54:34 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Thu, 2 May 2019 13:54:34 +0000 Subject: [ops] how to get best io performance from my block devices Message-ID: <3D6C1968-76B0-449F-B389-1B59384D16F9@cern.ch> There are some hints in https://wiki.openstack.org/wiki/Documentation/HypervisorTuningGuide There are some tips in https://www.linux-kvm.org/page/Tuning_KVM too but you’d need to find the corresponding OpenStack flags on the guest/images/hosts/flavors. Overall, there are several options so it’s recommended to establish a baseline performance on a representative work load and try the various options. Tim From: Manuel Sopena Ballesteros Date: Wednesday, 1 May 2019 at 06:35 To: "openstack-discuss at lists.openstack.org" Subject: how to get best io performance from my block devices Dear Openstack community, I would like to have a high performance distributed database running in Openstack vms. I tried attaching dedicated nvme pci devices to the vm but the performance is not as good as I can get from bare metal. Bare metal: [root at zeus-54 data]# fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [f(1)][100.0%][r=39.5MiB/s,w=39.6MiB/s][r=10.1k,w=10.1k IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=50892: Wed May 1 22:22:45 2019 read: IOPS=9805, BW=38.3MiB/s (40.2MB/s)(4596MiB/120001msec) slat (usec): min=39, max=6678, avg=94.72, stdev=55.78 clat (nsec): min=450, max=18224, avg=525.83, stdev=120.10 lat (usec): min=39, max=6679, avg=95.36, stdev=55.79 clat percentiles (nsec): | 1.00th=[ 462], 5.00th=[ 478], 10.00th=[ 482], 20.00th=[ 486], | 30.00th=[ 490], 40.00th=[ 494], 50.00th=[ 502], 60.00th=[ 510], | 70.00th=[ 516], 80.00th=[ 532], 90.00th=[ 596], 95.00th=[ 676], | 99.00th=[ 860], 99.50th=[ 1048], 99.90th=[ 1384], 99.95th=[ 2480], | 99.99th=[ 3728] bw ( KiB/s): min= 720, max=40736, per=100.00%, avg=39389.00, stdev=5317.58, samples=239 iops : min= 180, max=10184, avg=9847.23, stdev=1329.39, samples=239 write: IOPS=9799, BW=38.3MiB/s (40.1MB/s)(4594MiB/120001msec) slat (nsec): min=2982, max=106207, avg=4220.09, stdev=980.04 clat (nsec): min=407, max=18130, avg=451.48, stdev=103.71 lat (usec): min=3, max=111, avg= 4.74, stdev= 1.03 clat percentiles (nsec): | 1.00th=[ 414], 5.00th=[ 418], 10.00th=[ 422], 20.00th=[ 430], | 30.00th=[ 434], 40.00th=[ 434], 50.00th=[ 438], 60.00th=[ 438], | 70.00th=[ 442], 80.00th=[ 446], 90.00th=[ 462], 95.00th=[ 588], | 99.00th=[ 700], 99.50th=[ 916], 99.90th=[ 1208], 99.95th=[ 1288], | 99.99th=[ 3536] bw ( KiB/s): min= 752, max=42608, per=100.00%, avg=39366.63, stdev=5355.73, samples=239 iops : min= 188, max=10652, avg=9841.64, stdev=1338.93, samples=239 lat (nsec) : 500=69.98%, 750=28.64%, 1000=0.90% lat (usec) : 2=0.42%, 4=0.04%, 10=0.01%, 20=0.01% cpu : usr=2.20%, sys=10.85%, ctx=1176675, majf=0, minf=1372 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=1176625,1175958,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=38.3MiB/s (40.2MB/s), 38.3MiB/s-38.3MiB/s (40.2MB/s-40.2MB/s), io=4596MiB (4819MB), run=120001-120001msec WRITE: bw=38.3MiB/s (40.1MB/s), 38.3MiB/s-38.3MiB/s (40.1MB/s-40.1MB/s), io=4594MiB (4817MB), run=120001-120001msec Disk stats (read/write): nvme9n1: ios=1174695/883620, merge=0/0, ticks=105502/72225, in_queue=192101, util=99.28% From vm: [centos at kudu-1 nvme0]$ sudo fio --ioengine=libaio --name=test --filename=test --bs=4k --size=40G --readwrite=randrw --runtime=120 --time_based test: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1 fio-3.1 Starting 1 process Jobs: 1 (f=1): [m(1)][100.0%][r=29.2MiB/s,w=29.7MiB/s][r=7487,w=7595 IOPS][eta 00m:00s] test: (groupid=0, jobs=1): err= 0: pid=44383: Wed May 1 12:22:24 2019 read: IOPS=6994, BW=27.3MiB/s (28.6MB/s)(3278MiB/120000msec) slat (usec): min=54, max=20476, avg=115.27, stdev=71.45 clat (nsec): min=1757, max=31476, avg=2163.02, stdev=688.66 lat (usec): min=56, max=20481, avg=118.51, stdev=71.66 clat percentiles (nsec): | 1.00th=[ 1800], 5.00th=[ 1832], 10.00th=[ 1864], 20.00th=[ 1992], | 30.00th=[ 2040], 40.00th=[ 2064], 50.00th=[ 2064], 60.00th=[ 2096], | 70.00th=[ 2096], 80.00th=[ 2128], 90.00th=[ 2480], 95.00th=[ 2544], | 99.00th=[ 4448], 99.50th=[ 5536], 99.90th=[11072], 99.95th=[12736], | 99.99th=[18560] bw ( KiB/s): min= 952, max=31224, per=100.00%, avg=28153.51, stdev=4126.89, samples=237 iops : min= 238, max= 7806, avg=7038.23, stdev=1031.70, samples=237 write: IOPS=6985, BW=27.3MiB/s (28.6MB/s)(3274MiB/120000msec) slat (usec): min=7, max=963, avg=12.60, stdev= 6.24 clat (nsec): min=1662, max=199250, avg=2030.26, stdev=712.33 lat (usec): min=10, max=970, avg=15.68, stdev= 6.48 clat percentiles (nsec): | 1.00th=[ 1688], 5.00th=[ 1720], 10.00th=[ 1736], 20.00th=[ 1864], | 30.00th=[ 1928], 40.00th=[ 1944], 50.00th=[ 1944], 60.00th=[ 1960], | 70.00th=[ 1960], 80.00th=[ 1992], 90.00th=[ 2352], 95.00th=[ 2384], | 99.00th=[ 4048], 99.50th=[ 4768], 99.90th=[11456], 99.95th=[13120], | 99.99th=[19072] bw ( KiB/s): min= 912, max=31880, per=100.00%, avg=28119.64, stdev=4176.38, samples=237 iops : min= 228, max= 7970, avg=7029.75, stdev=1044.07, samples=237 lat (usec) : 2=51.56%, 4=47.17%, 10=1.03%, 20=0.22%, 50=0.01% lat (usec) : 250=0.01% cpu : usr=4.96%, sys=28.37%, ctx=839307, majf=0, minf=26 IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% issued rwt: total=839283,838268,0, short=0,0,0, dropped=0,0,0 latency : target=0, window=0, percentile=100.00%, depth=1 Run status group 0 (all jobs): READ: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3278MiB (3438MB), run=120000-120000msec WRITE: bw=27.3MiB/s (28.6MB/s), 27.3MiB/s-27.3MiB/s (28.6MB/s-28.6MB/s), io=3274MiB (3434MB), run=120000-120000msec Disk stats (read/write): nvme0n1: ios=838322/651596, merge=0/0, ticks=83804/22119, in_queue=104773, util=70.18% Is there a way I can get near bare metal performance from my nvme block devices? NOTICE Please consider the environment before printing this email. This message and any attachments are intended for the addressee named and may contain legally privileged/confidential/copyright information. If you are not the intended recipient, you should not read, use, disclose, copy or distribute this communication. If you have received this message in error please notify us at once by return email and then delete both messages. We accept no liability for the distribution of viruses or similar in electronic communications. This notice should not be removed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shyam.biradar at trilio.io Thu May 2 14:05:37 2019 From: shyam.biradar at trilio.io (Shyam Biradar) Date: Thu, 2 May 2019 19:35:37 +0530 Subject: kolla-ansible pike - nova_compute containers not starting Message-ID: Hi, I am setting up all-in-one ubuntu based kolla-ansible pike openstack. Deployment is failing at following ansible task: TASK [nova : include_tasks] ********************************************************************************************************************** included: /root/virtnev/share/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml for localhost TASK [nova : Waiting for nova-compute service up] ************************************************************************************************ FAILED - RETRYING: Waiting for nova-compute service up (20 retries left). FAILED - RETRYING: Waiting for nova-compute service up (19 retries left). FAILED - RETRYING: Waiting for nova-compute service up (18 retries left). FAILED - RETRYING: Waiting for nova-compute service up (17 retries left). FAILED - RETRYING: Waiting for nova-compute service up (16 retries left). FAILED - RETRYING: Waiting for nova-compute service up (15 retries left). FAILED - RETRYING: Waiting for nova-compute service up (14 retries left). FAILED - RETRYING: Waiting for nova-compute service up (13 retries left). FAILED - RETRYING: Waiting for nova-compute service up (12 retries left). FAILED - RETRYING: Waiting for nova-compute service up (11 retries left). FAILED - RETRYING: Waiting for nova-compute service up (10 retries left). FAILED - RETRYING: Waiting for nova-compute service up (9 retries left). FAILED - RETRYING: Waiting for nova-compute service up (8 retries left). FAILED - RETRYING: Waiting for nova-compute service up (7 retries left). FAILED - RETRYING: Waiting for nova-compute service up (6 retries left). FAILED - RETRYING: Waiting for nova-compute service up (5 retries left). FAILED - RETRYING: Waiting for nova-compute service up (4 retries left). FAILED - RETRYING: Waiting for nova-compute service up (3 retries left). FAILED - RETRYING: Waiting for nova-compute service up (2 retries left). FAILED - RETRYING: Waiting for nova-compute service up (1 retries left). fatal: [localhost -> localhost]: FAILED! => {"attempts": 20, "changed": false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", "--os-interface", "internal", "--os-auth-url", "http://192.168.122.151:35357", "--os-identity-api-version", "3", "--os-project-domain-name", "default", "--os-tenant-name", "admin", "--os-username", "admin", "--os-password", "ivpu1km8qxnVQESvAF4cyTFstOvrbxGUHjFF15gZ", "--os-user-domain-name", "default", "compute", "service", "list", "-f", "json", "--service", "nova-compute"], "delta": "0:00:02.555356", "end": "2019-05-02 09:24:45.485786", "rc": 0, "start": "2019-05-02 09:24:42.930430", "stderr": "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]} -------------------------------------------------------------------- I can see following stack trace in nova-compute container log 4. 2019-05-02 08:21:30.522 7 INFO nova.service [-] Starting compute node (version 16.1.7) 2019-05-02 08:21:30.524 7 ERROR oslo_service.service [-] Error starting thread.: PlacementNotConfigured: This compute is not configured to talk to the placement service. Configure the [placement] section of nova.conf and restart the service. 2019-05-02 08:21:30.524 7 ERROR oslo_service.service Traceback (most recent call last): 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_service/service.py", line 721, in run_service 2019-05-02 08:21:30.524 7 ERROR oslo_service.service service.start() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/service.py", line 156, in start 2019-05-02 08:21:30.524 7 ERROR oslo_service.service self.manager.init_host() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", line 1155, in init_host 2019-05-02 08:21:30.524 7 ERROR oslo_service.service raise exception.PlacementNotConfigured() 2019-05-02 08:21:30.524 7 ERROR oslo_service.service PlacementNotConfigured: This compute is not configured to talk to the placement service. Configure the [placement] section of nova.conf and restart the service. 2019-05-02 08:21:30.524 7 ERROR oslo_service.service 2019-05-02 08:21:59.229 7 INFO os_vif [-] Loaded VIF plugins: ovs, linux_bridge --------------------------------------------------------------------- I saw nova-compute nova.conf has [placement] section configured well and it's same as nova_api's placement section. Other nova containers are started well. Any thoughts? [image: logo] *Shyam Biradar* * Software Engineer | DevOps* M +91 8600266938 | shyam.biradar at trilio.io | trilio.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoanq13 at viettel.com.vn Sat May 4 06:50:53 2019 From: hoanq13 at viettel.com.vn (hoanq13 at viettel.com.vn) Date: Sat, 4 May 2019 13:50:53 +0700 (ICT) Subject: [Vitrage] add datasource kapacitor for vitrage In-Reply-To: References: <14511424.947437.1556614048877.JavaMail.zimbra@viettel.com.vn> <1324083046.973516.1556615406841.JavaMail.zimbra@viettel.com.vn> Message-ID: <1913467486.1561697.1556952653279.JavaMail.zimbra@viettel.com.vn> Hi, All the test are pass, hope you review soon. Best regards Hoa ----- Original Message ----- From: eyalb1 at gmail.com To: hoanq13 at viettel.com.vn Cc: openstack-discuss at lists.openstack.org Sent: Thursday, May 2, 2019 2:12:12 PM Subject: Re: [Vitrage] add datasource kapacitor for vitrage Hi, Please make sure all test are passing Eyal On Thu, May 2, 2019, 02:18 < hoanq13 at viettel.com.vn > wrote: Hi, In our system, we use monitor by TICK stack (include: Telegraf for collect metric, InfluxDB for storage metric, Chronograf for visualize and Kapacitor alarming), which is popular monitor solution. We hope can integrate vitrage in, so we decide to write kapacitor datasource contribute for vitrage. The work is almost done , you can review in: https://review.opendev.org/653416 So i send this mail hope for more review, ideal,... Appreciate it. also ask: have any step i miss in pipeline of contribute datasource vitrage? like create blueprints, vitrage-spec,vv.. Should i do it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefano at canepa.ge.it Sun May 5 21:51:00 2019 From: stefano at canepa.ge.it (Stefano Canepa) Date: Sun, 5 May 2019 15:51:00 -0600 Subject: [openstack-ansible][monasca][zaqar][watcher][searchlight] Retirement of unused OpenStack Ansible roles In-Reply-To: References: <236ef912-21c5-4345-98ce-067499921af1@www.fastmail.com> Message-ID: Hi all, I would like to maintain monasca related roles but I have to double check how much time I can allocate to this task. Please hold before retiring them. All the best Stefano Stefano Canepa sc at linux.it or stefano at canepa.ge.it On Wed, 24 Apr 2019, 14:51 Mohammed Naser, wrote: > Hi, > > These roles have been broken for over a year now, some are not even > integrated with the OpenStack Ansible integrated repository. > > I think it's safe to say that for the most part, they have no users or > consumers unless someone has integrated it downstream somewhere and > didn't push that back out. It is a lot of overhead to maintain roles, > we're a small team that has to manage a huge amount of roles and their > integration, while on paper, I'd love for someone to step in and help, > but no one has for over a year. > > If someone wants to step in and get those roles to catch up on all the > technical debt they've accumulated (because when we'd do fixes across > all roles, we would always leave them.. because they always failed > tests..) then we're one revert away from it. I have some thoughts on > how we can resolve this for the future, but they're much more long > term, but for now, the additional workload on our very short resourced > team is a lot. > > Thanks, > Mohammed > > On Wed, Apr 24, 2019 at 8:56 AM Guilherme Steinmüller > wrote: > > > > Hello Witek and Jean-Philippe. > > > > I will hold off the retirement process until the end of PTG. > > > > Just for your information, that's what we have until now > https://review.opendev.org/#/q/topic:retire-osa-unused-roles+(status:open+OR+status:merged) > . > > > > I just -w the monsca roles as they were the only roles someone > manifested interest. > > > > Regards > > > > On Wed, Apr 24, 2019 at 8:14 AM Jean-Philippe Evrard < > jean-philippe at evrard.me> wrote: > >> > >> I am not sure this follows our documented retirement process, and it > seems very early to do so for some roles. > >> I think we should discuss role retirement at the next PTG (if we want > to change that process). > >> > >> In the meantime, I encourage people from the > monasca/zaqar/watcher/searchlight community interested deploying with > openstack-ansible to step up and take over their respective role's > maintainance. > >> > >> Regards, > >> Jean-Philippe Evrard (evrardjp). > >> > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chenjiengu at gmail.com Wed May 8 11:48:32 2019 From: chenjiengu at gmail.com (=?UTF-8?B?6ZmI5p2w?=) Date: Wed, 8 May 2019 19:48:32 +0800 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) Message-ID: Nowdays , the opestack rocky release ironic , is support ironic boot from cinder volume(the cinder volume backend is ceph storage)? My goal is to achieve this. Who can tell me about this principle? looking forward to a reply thank you all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From zackchen517 at gmail.com Wed May 8 12:24:36 2019 From: zackchen517 at gmail.com (zack chen) Date: Wed, 8 May 2019 20:24:36 +0800 Subject: Baremetal attach volume in Multi-tenancy Message-ID: Hi, I am looking for a mechanism that can be used for baremetal attach volume in a multi-tenant scenario. In addition we use ceph as the backend storage for cinder. Can anybody give me some advice? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rafaelweingartner at gmail.com Wed May 8 15:08:55 2019 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 8 May 2019 12:08:55 -0300 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: Message-ID: Thanks, I'll be there. Em qua, 8 de mai de 2019 11:41, Trinh Nguyen escreveu: > Hi Rafael, > > The meeting will be held on the IRC channel #openstack-telemetry as > mentioned in the previous email. > > Thanks, > > On Wed, May 8, 2019 at 10:50 PM Rafael Weingärtner < > rafaelweingartner at gmail.com> wrote: > >> Hello Trinh, >> Where does the meeting happen? Will it be via IRC Telemetry channel? Or, >> in the Etherpad ( >> https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like >> to discuss and understand a bit better the context behind the Telemetry >> events deprecation. >> >> On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen >> wrote: >> >>> Hi team, >>> >>> As planned, we will have a team meeting at 02:00 UTC, May 9th on >>> #openstack-telemetry to discuss what we gonna do for the next milestone >>> (Train-1) and continue what we left off from the last meeting. >>> >>> I put here [1] the agenda thinking that it should be fine for an hour >>> meeting. If you have anything to talk about, please put it there too. >>> >>> [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda >>> >>> >>> Bests, >>> >>> -- >>> *Trinh Nguyen* >>> *www.edlab.xyz * >>> >>> >> >> -- >> Rafael Weingärtner >> > > > -- > *Trinh Nguyen* > *www.edlab.xyz * > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Wed May 8 15:14:54 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 10:14:54 -0500 Subject: [cinder] Python3 requirements for Train In-Reply-To: References: Message-ID: <2ab79d3a-469e-ca97-5ffc-3bd9a8015251@gmail.com> All, One additional note.  Drivers that fail to have Python 3 testing running in their CI environment by Milestone 2 will have a patch pushed up that will mark the driver as unsupported. Jay On 5/8/2019 10:04 AM, Walter Boring wrote: > Hello Cinder folks, >    The train release is going to be the last release of OpenStack with > python 2 support.  Train also is going to require supporting python > 3.6 and 3.7.  This means that we should be enabling and or switching > over all of our 3rd party CI runs to python 3 to ensure that our > drivers and all of their required libraries run properly in a python > 3.6/3.7 environment.  This will help driver maintainers discover any > python3 incompatibilities with their driver as well as any required > libraries.  At the PTG in Denver, the cinder team agreed that we > wanted driver CI systems to start using python3 by milestone 2 for > Train.  This would be the July 22-26th time frame [1]. > > >   We are also working on adding driver library requirements to the > OpenStack global requirements project [2] [3]. This effort will > provide native install primitives for driver libraries in cinder. This > process also requires the driver libraries to run in python3.6/3.7. > > > The Cinder team wants to maintain it's high quality of driver support > in the train release.  By enabling python 3.6 and python 3.7 in CI > tests, this will help everyone ship Cinder with the required support > in Train and the following releases. > > Walt > > [1] https://releases.openstack.org/train/schedule.html > [2] https://review.opendev.org/#/c/656724/ > [3] https://review.opendev.org/#/c/657395/ From waboring at hemna.com Wed May 8 15:28:21 2019 From: waboring at hemna.com (Walter Boring) Date: Wed, 8 May 2019 11:28:21 -0400 Subject: Baremetal attach volume in Multi-tenancy In-Reply-To: References: Message-ID: To attach to baremetal instance, you will need to install the cinderclient along with the python-brick-cinderclient-extension inside the instance itself. On Wed, May 8, 2019 at 11:15 AM zack chen wrote: > Hi, > I am looking for a mechanism that can be used for baremetal attach volume > in a multi-tenant scenario. In addition we use ceph as the backend storage > for cinder. > > Can anybody give me some advice? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From skaplons at redhat.com Wed May 8 15:30:57 2019 From: skaplons at redhat.com (Slawomir Kaplonski) Date: Wed, 8 May 2019 17:30:57 +0200 Subject: [Neutron] Can I create VM on flat network which doesnt have any subnet attached to it. In-Reply-To: References: Message-ID: <884636FB-5942-4A7B-B815-41FE708626BD@redhat.com> Hi, I don’t think this is currently supported. But there is ongoing work to add support for such feature. See [1] for details. [1] https://review.opendev.org/#/c/641670/ > On 3 May 2019, at 13:01, Akki yadav wrote: > > Hello Team, > > Hope you all are doing good. I wanted to know that can I launch a VM on a flat network directly which doesn't have any subnet attached to it. > > Steps to be followed: > Create a flat Network without a subnet > Attach the network to create a VM. > > Aim :- Spawn 2 VM's on network without any subnet, without any IP assigned to them. Then statically allocate same subnet IP to them and ping each other. > > Issue:- VM creation is getting failed stating that there is no subnet found. > > How can we resolve this? > > Thanks & Regards > Akshay — Slawek Kaplonski Senior software engineer Red Hat From mriedemos at gmail.com Wed May 8 15:34:26 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 10:34:26 -0500 Subject: [nova][ops] 'Duplicate entry for primary key' problem running nova-manage db archive_deleted_rows In-Reply-To: References: Message-ID: <9a74e1c1-86bf-4dde-5885-8faa626a79ff@gmail.com> On 5/8/2019 10:04 AM, Massimo Sgaravatto wrote: > The problem is not for that single entry > Looks like the auto_increment for that table was reset (I  don't know > when-how) Just purge your shadow tables. As Surya noted, there is a purge CLI in nova-manage on newer releases now which would do the same thing. You can either backport that, or simply run it in a container or virtualenv, or just do it manually. If you're paranoid, purge the entries that were created before ocata. -- Thanks, Matt From jungleboyj at gmail.com Wed May 8 15:35:06 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 10:35:06 -0500 Subject: [cinder] Help with a review please In-Reply-To: <55F040AF-16C8-4029-B306-7E81B4BE191A@gmail.com> References: <55F040AF-16C8-4029-B306-7E81B4BE191A@gmail.com> Message-ID: Sam, Thank you for reaching out to the mailing list on this issue.  I am sorry that the review has been stuck in something of a limbo for quite some time.  This is not the developer experience we strive for as a team. Since it appears that we are having trouble reaching agreement as to whether this is a good change I would recommend bringing this topic up at our next weekly meeting so that we can all work out the details together. If you would like to discuss this issue please add it to the agenda for the next meeting [1]. Thanks! Jay [1] https://etherpad.openstack.org/p/cinder-train-meetings On 5/8/2019 2:51 AM, Sam Morrison wrote: > Hi, > > I’ve had a review going on for over 8 months now [1] and would love to > get this in, it’s had +2s over the period and keeps getting nit > picked, finally being knocked back due to no spec which there now is [2] > This is now stalled itself after having a +2 and it is very depressing. > > I have had generally positive experiences contributing to openstack > but this has been a real pain, is there something I can do to make > this go smoother? > > Thanks, > Sam > > > [1] https://review.opendev.org/#/c/599866/ > [2] https://review.opendev.org/#/c/645056/ From jungleboyj at gmail.com Wed May 8 15:41:13 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 10:41:13 -0500 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: Message-ID: This is going to require being able to export Ceph volumes via iSCSI.  The Ironic team communicated the importance of this feature to the Cinder team a few months ago. We are working on getting this support in place soon but it probably will not be until the U release. Thanks! Jay On 5/8/2019 6:48 AM, 陈杰 wrote: > Nowdays , the opestack rocky release ironic , is support ironic boot > from cinder volume(the cinder volume backend is ceph storage)? My goal > is to achieve this. > Who can tell me about this principle? > looking forward to a reply > thank you all. From aspiers at suse.com Wed May 8 15:45:11 2019 From: aspiers at suse.com (Adam Spiers) Date: Wed, 8 May 2019 16:45:11 +0100 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> Message-ID: <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> Jeremy Stanley wrote: >On 2019-05-07 15:06:10 -0500 (-0500), Jay Bryant wrote: >>Cinder has been working with the same unwritten rules for quite some time as >>well with minimal issues. >> >>I think the concerns about not having it documented are warranted.  We have >>had question about it in the past with no documentation to point to.  It is >>more or less lore that has been passed down over the releases.  :-) >> >>At a minimum, having this e-mail thread is helpful.  If, however, we decide >>to document it I think we should have it consistent across the teams that >>use the rule.  I would be happy to help draft/review any such documentation. >[...] > >I have a feeling that a big part of why it's gone undocumented for >so long is that putting it in writing risks explicitly sending the >message that we don't trust our contributors to act in the best >interests of the project even if those are not aligned with the >interests of their employer/sponsor. I think many of us attempt to >avoid having all activity on a given patch come from people with the >same funding affiliation so as to avoid giving the impression that >any one organization is able to ram changes through with no >oversight, but more because of the outward appearance than because >we don't trust ourselves or our colleagues. > >Documenting our culture is a good thing, but embodying that >documentation with this sort of nuance can be challenging. That's a good point. Maybe that risk could be countered by explicitly stating something like "this is not currently an issue within the community, and it has rarely, if ever, been one in the past; therefore this policy is a preemptive safeguard rather than a reactive one" ? From jaypipes at gmail.com Wed May 8 15:50:52 2019 From: jaypipes at gmail.com (Jay Pipes) Date: Wed, 8 May 2019 11:50:52 -0400 Subject: [ops][nova]Logging in nova and other openstack projects In-Reply-To: <53BF2204-988C-4ED6-A687-F6188B90C547@planethoster.info> References: <62034C21-91FC-4A9A-BC4B-47E372EAB925@planethoster.info> <53BF2204-988C-4ED6-A687-F6188B90C547@planethoster.info> Message-ID: <189efcf0-07f4-d5eb-b17b-658684ad0bbb@gmail.com> Sweet! :) Glad it worked! Best, -jay On 05/08/2019 10:43 AM, Jean-Philippe Méthot wrote: > Hi, > > Indeed, the remaining info messages were coming from the nova-compute > resource tracker. Adding nova=WARN in the list did remove these > messages. Thank you very much for your help. > > Best regards, > > Jean-Philippe Méthot > Openstack system administrator > Administrateur système Openstack > PlanetHoster inc. > > > > >> Le 8 mai 2019 à 09:35, Jay Pipes > > a écrit : >> >> Sorry for delayed response... comments inline. >> >> On 05/07/2019 05:31 PM, Jean-Philippe Méthot wrote: >>> Indeed, this is what was written in your original response as well as >>> in the documentation. As a result, it was fairly difficult to miss >>> and I did comment it out before restarting the service. Additionally, >>> as per the configuration I had set up, had the log-config-append >>> option be set, I wouldn’t have any INFO level log in my logs. Hence >>> why I believe it is strange that I have info level logs, when I’ve >>> set default_log_levels like this: >>> default_log_levels >>> = amqp=WARN,amqplib=WARN,boto=WARN,qpid=WARN,sqlalchemy=WARN,suds=WARN,oslo.messaging=WARN,iso8601=WARN,requests.packages.urllib3.connectionpool=WARN,urllib3.connectionpool=WARN,websocket=WARN,requests.packages.urllib3.util.retry=WARN,urllib3.util.retry=WARN,keystonemiddleware=WARN,routes.middleware=WARN,stevedore=WARN,taskflow=WARN,keystoneauth=WARN,oslo.cache=WARN >> >> Do you see any of the above modules logging with INFO level, though? >> Or are you just seeing other modules (e.g. nova.*) logging at INFO level? >> >> If you are only seeing nova modules logging at INFO level, try adding: >> >> ,nova=WARN >> >> to the default_log_levels CONF option. >> >> Let us know if this works :) >> >> Best, >> -jay >> > From Tim.Bell at cern.ch Wed May 8 15:55:18 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Wed, 8 May 2019 15:55:18 +0000 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: Message-ID: Just brainstorming.... Would it be possible to set up a couple of VMs as iscsi LIO gateways by hand while this feature is being developed and using that end point to boot an Ironic node? You may also be on a late enough version of Ceph to do it using http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/. Not self-service but could work for a few cases.. Tim -----Original Message----- From: Jay Bryant Reply-To: "jsbryant at electronicjungle.net" Date: Wednesday, 8 May 2019 at 17:46 To: "openstack-discuss at lists.openstack.org" Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) This is going to require being able to export Ceph volumes via iSCSI. The Ironic team communicated the importance of this feature to the Cinder team a few months ago. We are working on getting this support in place soon but it probably will not be until the U release. Thanks! Jay On 5/8/2019 6:48 AM, 陈杰 wrote: > Nowdays , the opestack rocky release ironic , is support ironic boot > from cinder volume(the cinder volume backend is ceph storage)? My goal > is to achieve this. > Who can tell me about this principle? > looking forward to a reply > thank you all. From mriedemos at gmail.com Wed May 8 15:58:30 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 10:58:30 -0500 Subject: [nova][ptg] Summary: Implicit trait-based filters In-Reply-To: <1557213589.2232.0@smtp.office365.com> References: <1557213589.2232.0@smtp.office365.com> Message-ID: On 5/7/2019 2:19 AM, Balázs Gibizer wrote: > 3) The request pre-filters [7] run before the placement a_c query is > generated. But these today changes the fields of the RequestSpec (e.g. > requested_destination) that would mean the regeneration of > RequestSpec.requested_resources would be needed. This probably solvable > by changing the pre-filters to work directly on > RequestSpec.requested_resources after we solved all the other issues. Yeah this is something I ran into while hacking on the routed networks aggregate stuff [1]. I added information to the RequestSpec so I could use it in a pre-filter (required aggregates) but I can't add that to the requested_resources in the RequestSpec without resources (and in the non-bw port case there is no RequestSpec.requested_resources yet), so what I did was hack the unnumbered RequestGroup after the pre-filters and after the RequestSpec was processed by resources_from_request_spec, but before the code that makes the GET /a_c call. It's definitely ugly and I'm not even sure it works yet (would need functional testing). What I've wondered is if there is a way we could merge request groups in resources_from_request_spec so if a pre-filter added an unnumbered RequestGroup to the RequestSpec (via the requestd_resources attribute) that resources_from_request_spec would then merge in the flavor information. That's what I initially tried with the multiattach required traits patch [2] but the groups weren't merged for whatever reason and GET /a_c failed because I had a group with a required trait but no resources. [1] https://review.opendev.org/#/c/656885/3/nova/scheduler/manager.py [2] https://review.opendev.org/#/c/645316/ -- Thanks, Matt From mriedemos at gmail.com Wed May 8 16:03:17 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 11:03:17 -0500 Subject: [nova][cinder][ptg] Summary: Swap volume woes In-Reply-To: <20190506131834.nyc7k7qltdsmamuq@lyarwood.usersys.redhat.com> References: <20190506131834.nyc7k7qltdsmamuq@lyarwood.usersys.redhat.com> Message-ID: On 5/6/2019 8:18 AM, Lee Yarwood wrote: > - Deprecate the existing swap volume API in Train, remove in U. I don't remember this coming up. Deprecation is one thing if we have an alternative, but removal isn't really an option. Yes we have 410'ed some REST APIs for removed services (nova-network, nova-cells) but for the most part we're married to our REST APIs so we can deprecate things to signal "don't use these anymore" but that doesn't mean we can just delete them. This is why we require a spec for all API changes, because of said marriage. -- Thanks, Matt From ed at leafe.com Wed May 8 16:04:18 2019 From: ed at leafe.com (Ed Leafe) Date: Wed, 8 May 2019 11:04:18 -0500 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> Message-ID: <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> On May 8, 2019, at 10:45 AM, Adam Spiers wrote: > >> I have a feeling that a big part of why it's gone undocumented for so long is that putting it in writing risks explicitly sending the message that we don't trust our contributors to act in the best interests of the project even if those are not aligned with the interests of their employer/sponsor. I think many of us attempt to avoid having all activity on a given patch come from people with the same funding affiliation so as to avoid giving the impression that any one organization is able to ram changes through with no oversight, but more because of the outward appearance than because we don't trust ourselves or our colleagues. >> Documenting our culture is a good thing, but embodying that documentation with this sort of nuance can be challenging. > > That's a good point. Maybe that risk could be countered by explicitly stating something like "this is not currently an issue within the community, and it has rarely, if ever, been one in the past; therefore this policy is a preemptive safeguard rather than a reactive one" ? I think that’s a good approach. This way if such a situation comes up and people wonder why others are questioning it, it will be all above-board. The downside of *not* documenting this concern is that in the future if it is ever needed to be mentioned, the people involved might feel that the community is suddenly ganging up against their company, instead of simply following documented policy. -- Ed Leafe From jungleboyj at gmail.com Wed May 8 16:04:21 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 11:04:21 -0500 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: Message-ID: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> Tim, Good thought.  That would be an interim solution until we are able to get the process automated. Jay On 5/8/2019 10:55 AM, Tim Bell wrote: > Just brainstorming.... > > Would it be possible to set up a couple of VMs as iscsi LIO gateways by hand while this feature is being developed and using that end point to boot an Ironic node? You may also be on a late enough version of Ceph to do it using http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/. > > Not self-service but could work for a few cases.. > > Tim > > -----Original Message----- > From: Jay Bryant > Reply-To: "jsbryant at electronicjungle.net" > Date: Wednesday, 8 May 2019 at 17:46 > To: "openstack-discuss at lists.openstack.org" > Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) > > This is going to require being able to export Ceph volumes via iSCSI. > The Ironic team communicated the importance of this feature to the > Cinder team a few months ago. > > We are working on getting this support in place soon but it probably > will not be until the U release. > > Thanks! > > Jay > > > On 5/8/2019 6:48 AM, 陈杰 wrote: > > Nowdays , the opestack rocky release ironic , is support ironic boot > > from cinder volume(the cinder volume backend is ceph storage)? My goal > > is to achieve this. > > Who can tell me about this principle? > > looking forward to a reply > > thank you all. > > > From mriedemos at gmail.com Wed May 8 16:07:44 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 11:07:44 -0500 Subject: [nova][ptg] Summary: Tech Debt In-Reply-To: References: Message-ID: On 5/6/2019 3:12 PM, Eric Fried wrote: > - Remove the nova-console I'm still not clear on this one. Someone from Rackspace (Matt DePorter) said at the summit that they are still using xen and still rely on the nova-console service. Citrix supports Rackspace and Bob (Citrix) said we could drop the nova-console service, so I'm not sure what to make of the support matrix here - can we drop it or not for xen users? Is there an alternative? On the other hand, there was an undercurrent of support for deprecating the xenapi driver since it's not really maintained anymore and CI hasn't worked on it for several months. So if we go that route, what would the plan be? Deprecate the driver in Train and if no one steps up to maintain it and get CI working, drop it in U along with the nova-console service and xvp console? -- Thanks, Matt From jasonanderson at uchicago.edu Wed May 8 16:14:09 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 8 May 2019 16:14:09 +0000 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> References: , <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> Message-ID: Tim, Jay -- I looked in to this recently as it was a use-case some of our HPC users wanted support for. I noticed that Ceph has the iSCSI gateway, but my impression was that this wouldn't work without adding some sort of new driver in Cinder. Is that not true? I thought that Cinder only Ceph via RBD. I'd be happy to be proven wrong on this. Cheers, /Jason ________________________________ From: Jay Bryant Sent: Wednesday, May 8, 2019 11:04 To: openstack-discuss at lists.openstack.org Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) Tim, Good thought. That would be an interim solution until we are able to get the process automated. Jay On 5/8/2019 10:55 AM, Tim Bell wrote: > Just brainstorming.... > > Would it be possible to set up a couple of VMs as iscsi LIO gateways by hand while this feature is being developed and using that end point to boot an Ironic node? You may also be on a late enough version of Ceph to do it using http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/. > > Not self-service but could work for a few cases.. > > Tim > > -----Original Message----- > From: Jay Bryant > Reply-To: "jsbryant at electronicjungle.net" > Date: Wednesday, 8 May 2019 at 17:46 > To: "openstack-discuss at lists.openstack.org" > Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) > > This is going to require being able to export Ceph volumes via iSCSI. > The Ironic team communicated the importance of this feature to the > Cinder team a few months ago. > > We are working on getting this support in place soon but it probably > will not be until the U release. > > Thanks! > > Jay > > > On 5/8/2019 6:48 AM, 陈杰 wrote: > > Nowdays , the opestack rocky release ironic , is support ironic boot > > from cinder volume(the cinder volume backend is ceph storage)? My goal > > is to achieve this. > > Who can tell me about this principle? > > looking forward to a reply > > thank you all. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 8 16:18:42 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 11:18:42 -0500 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <1556989044.27606.0@smtp.office365.com> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> <1556989044.27606.0@smtp.office365.com> Message-ID: On 5/4/2019 11:57 AM, Balázs Gibizer wrote: > The failure to detach a port via nova while the nova-compute is down > could be a bug on nova side. Depends on what you mean by detach. If the compute is down while deleting the server, the API will still call the (internal to nova) network API code [1] to either (a) unbind ports that nova didn't create or (2) delete ports that nova did create. For the policy change where the port has to be unbound to delete it, we'd already have support for that, it's just an extra step. At the PTG I was groaning a bit about needing to add another step to delete a port from the nova side, but thinking about it more we have to do the exact same thing with cinder volumes (we have to detach them before deleting them), so I guess it's not the worst thing ever. [1] https://github.com/openstack/nova/blob/56fef7c0e74d7512f062c4046def10401df16565/nova/compute/api.py#L2291 -- Thanks, Matt From jungleboyj at gmail.com Wed May 8 16:21:33 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 11:21:33 -0500 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> Message-ID: <6fb96df4-9b91-a0f9-933a-a806205409cf@gmail.com> Jason, You are correct.  The plan is to add a driver that will use the iSCSI gateway to make volumes available instead of using RBD commands.  So, the driver will be heavily based on the existing RBD driver but do the export via iSCSI gateway. Unfortunately, the iSCSI Gateway CLI is not well suited to remote execution so we have Walt Boring looking into better ways of interacting with the gateway or possibly updating the client to support our needs. If you want to see additional notes on the topic see our discussion from the PTG last week at around line 119.  [1] Thanks! Jay [1] https://etherpad.openstack.org/p/cinder-train-ptg-planning On 5/8/2019 11:14 AM, Jason Anderson wrote: > Tim, Jay -- > > I looked in to this recently as it was a use-case some of our HPC > users wanted support for. I noticed that Ceph has the iSCSI gateway, > but my impression was that this wouldn't work without adding some sort > of new driver in Cinder. Is that not true? I thought that Cinder only > Ceph via RBD. I'd be happy to be proven wrong on this. > > Cheers, > /Jason > ------------------------------------------------------------------------ > *From:* Jay Bryant > *Sent:* Wednesday, May 8, 2019 11:04 > *To:* openstack-discuss at lists.openstack.org > *Subject:* Re: topic: ironic boot from cinder volume(the cinder volume > backend is ceph storage) -------------- next part -------------- An HTML attachment was scrubbed... URL: From jasonanderson at uchicago.edu Wed May 8 16:24:45 2019 From: jasonanderson at uchicago.edu (Jason Anderson) Date: Wed, 8 May 2019 16:24:45 +0000 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: <6fb96df4-9b91-a0f9-933a-a806205409cf@gmail.com> References: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> , <6fb96df4-9b91-a0f9-933a-a806205409cf@gmail.com> Message-ID: Thanks Jay! So I guess if one wants to use the iSCSI gateway with Ironic now, one would have to use the 'external' storage interface available since Rocky and do the poking of Ceph out of band. That won't really work for our use case, but perhaps others could take advantage. I'm very grateful that the Cinder team is spending time on this! Cheers, /Jason ________________________________ From: Jay Bryant Sent: Wednesday, May 8, 2019 11:21 To: openstack-discuss at lists.openstack.org Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) Jason, You are correct. The plan is to add a driver that will use the iSCSI gateway to make volumes available instead of using RBD commands. So, the driver will be heavily based on the existing RBD driver but do the export via iSCSI gateway. Unfortunately, the iSCSI Gateway CLI is not well suited to remote execution so we have Walt Boring looking into better ways of interacting with the gateway or possibly updating the client to support our needs. If you want to see additional notes on the topic see our discussion from the PTG last week at around line 119. [1] Thanks! Jay [1] https://etherpad.openstack.org/p/cinder-train-ptg-planning On 5/8/2019 11:14 AM, Jason Anderson wrote: Tim, Jay -- I looked in to this recently as it was a use-case some of our HPC users wanted support for. I noticed that Ceph has the iSCSI gateway, but my impression was that this wouldn't work without adding some sort of new driver in Cinder. Is that not true? I thought that Cinder only Ceph via RBD. I'd be happy to be proven wrong on this. Cheers, /Jason ________________________________ From: Jay Bryant Sent: Wednesday, May 8, 2019 11:04 To: openstack-discuss at lists.openstack.org Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Bell at cern.ch Wed May 8 16:25:33 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Wed, 8 May 2019 16:25:33 +0000 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> Message-ID: <3B227E63-CA0E-4F88-8ED9-331008FF008D@cern.ch> I’m not sure you actually need full cinder support, see “Boot Without Cinder” in https://docs.openstack.org/ironic/latest/admin/boot-from-volume.html (Never tried it though ….) Tim From: Jason Anderson Date: Wednesday, 8 May 2019 at 18:17 To: "openstack-discuss at lists.openstack.org" , "jsbryant at electronicjungle.net" Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) Tim, Jay -- I looked in to this recently as it was a use-case some of our HPC users wanted support for. I noticed that Ceph has the iSCSI gateway, but my impression was that this wouldn't work without adding some sort of new driver in Cinder. Is that not true? I thought that Cinder only Ceph via RBD. I'd be happy to be proven wrong on this. Cheers, /Jason ________________________________ From: Jay Bryant Sent: Wednesday, May 8, 2019 11:04 To: openstack-discuss at lists.openstack.org Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) Tim, Good thought. That would be an interim solution until we are able to get the process automated. Jay On 5/8/2019 10:55 AM, Tim Bell wrote: > Just brainstorming.... > > Would it be possible to set up a couple of VMs as iscsi LIO gateways by hand while this feature is being developed and using that end point to boot an Ironic node? You may also be on a late enough version of Ceph to do it using http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/. > > Not self-service but could work for a few cases.. > > Tim > > -----Original Message----- > From: Jay Bryant > Reply-To: "jsbryant at electronicjungle.net" > Date: Wednesday, 8 May 2019 at 17:46 > To: "openstack-discuss at lists.openstack.org" > Subject: Re: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) > > This is going to require being able to export Ceph volumes via iSCSI. > The Ironic team communicated the importance of this feature to the > Cinder team a few months ago. > > We are working on getting this support in place soon but it probably > will not be until the U release. > > Thanks! > > Jay > > > On 5/8/2019 6:48 AM, 陈杰 wrote: > > Nowdays , the opestack rocky release ironic , is support ironic boot > > from cinder volume(the cinder volume backend is ceph storage)? My goal > > is to achieve this. > > Who can tell me about this principle? > > looking forward to a reply > > thank you all. > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Wed May 8 16:28:24 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 8 May 2019 11:28:24 -0500 Subject: [nova][ptg] Summary: Resource Management Daemon In-Reply-To: Message-ID: <80626b5e-735e-dbfd-c211-6305048ceeda@fried.cc> > [1] There has been a recurring theme of needing "some kind of config" - > not necessarily nova.conf or any oslo.config - that can describe: > - Resource provider name/uuid/parentage, be it an existing provider or a > new nested provider; > - Inventory (e.g. last-level cache in this case); > - Physical resource(s) to which the inventory corresponds (e.g. "cache > ways" in this case); > - Traits, aggregates, other? > As of this writing, no specifics have been decided, even to the point of > positing that it could be the same file for some/all of the specs for > which the issue arose. A proposal extremely close to this has been in the works in various forms for about a year now, the latest iteration of which can be found at [2]. Up to this point, there has been a general lack of enthusiasm for it, probably because we just didn't have any really strong use cases yet. I think we do now, given that RMD and others (including [3]) have expressed a need for it in Train. As such, Dakshina and team have agreed to take over that spec and move forward with it. To be clear, this will drive toward a general-purpose resource provider customization/description mechanism, not be RMD-specific. efried [2] https://review.opendev.org/#/c/612497/ [3] http://lists.openstack.org/pipermail/openstack-discuss/2019-May/005802.html From morgan.fainberg at gmail.com Wed May 8 16:28:21 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Wed, 8 May 2019 09:28:21 -0700 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> Message-ID: On Wed, May 8, 2019, 09:09 Ed Leafe wrote: > On May 8, 2019, at 10:45 AM, Adam Spiers wrote: > > > >> I have a feeling that a big part of why it's gone undocumented for so > long is that putting it in writing risks explicitly sending the message > that we don't trust our contributors to act in the best interests of the > project even if those are not aligned with the interests of their > employer/sponsor. I think many of us attempt to avoid having all activity > on a given patch come from people with the same funding affiliation so as > to avoid giving the impression that any one organization is able to ram > changes through with no oversight, but more because of the outward > appearance than because we don't trust ourselves or our colleagues. > >> Documenting our culture is a good thing, but embodying that > documentation with this sort of nuance can be challenging. > > > > That's a good point. Maybe that risk could be countered by explicitly > stating something like "this is not currently an issue within the > community, and it has rarely, if ever, been one in the past; therefore this > policy is a preemptive safeguard rather than a reactive one" ? > > I think that’s a good approach. This way if such a situation comes up and > people wonder why others are questioning it, it will be all above-board. > The downside of *not* documenting this concern is that in the future if it > is ever needed to be mentioned, the people involved might feel that the > community is suddenly ganging up against their company, instead of simply > following documented policy. > > > -- Ed Leafe > In general I would rather see trust be pushed forward. The cores are explicitly trusted individuals within a team. I would encourage setting this policy aside and revisit if it ever becomes an issue. I think this policy, preemptive or not, highlights a lack of trust. It is another reason why Keystone team abolished the rule. AI.kuch prefer trusting the cores with landing code with or without external/additional input as they feel is appropriate. There are remedies if something lands inappropriately (revert, removal of core status, etc). As stated earlier in the quoted email, this has never or almost never been an issue. With that said, I don't have a strongly vested interest outside of seeing our community succeeding and being as open/inclusive as possible (since most contributions I am working on are not subject to this policy). As long as the policy isn't strictly tribal knowledge, we are headed in the right direction. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Wed May 8 16:31:17 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 11:31:17 -0500 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: References: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> <6fb96df4-9b91-a0f9-933a-a806205409cf@gmail.com> Message-ID: Jason, Thanks for the input on this.  Helps us know the priority of this effort. If you have additional input or are able to help with the effort we welcome your contributions/input. Once we have the spec for this up I will mail the mailing list to keep everyone in the loop. Jay On 5/8/2019 11:24 AM, Jason Anderson wrote: > Thanks Jay! > > So I guess if one wants to use the iSCSI gateway with Ironic now, one > would have to use the 'external' storage interface available since > Rocky and do the poking of Ceph out of band. That won't really work > for our use case, but perhaps others could take advantage. > > I'm very grateful that the Cinder team is spending time on this! > > Cheers, > /Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 8 16:36:04 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 11:36:04 -0500 Subject: [nova] Stein regressions In-Reply-To: <27a23eb0-b31b-0f25-db74-bdef81908939@gmail.com> References: <27a23eb0-b31b-0f25-db74-bdef81908939@gmail.com> Message-ID: <61c255ff-942a-60d9-d55c-9df9b7338434@gmail.com> Another update on these now that we're past the summit and PTG. On 4/16/2019 9:30 PM, Matt Riedemann wrote: > > 1. https://bugs.launchpad.net/nova/+bug/1822801 > Done - backport is merged to stable/stein (not yet released). > > 2. https://bugs.launchpad.net/nova/+bug/1824435 > Still no fix proposed for this yet but it is re-createable in devstack. > > 3. https://bugs.launchpad.net/nova/+bug/1825034 > The fix is merged on master, backports proposed [1]. > > 4. https://bugs.launchpad.net/nova/+bug/1825020 Done - backport is merged to stable/stein (not yet released). [1] https://review.opendev.org/#/q/topic:bug/1825034+status:open -- Thanks, Matt From jungleboyj at gmail.com Wed May 8 16:40:20 2019 From: jungleboyj at gmail.com (Jay Bryant) Date: Wed, 8 May 2019 11:40:20 -0500 Subject: topic: ironic boot from cinder volume(the cinder volume backend is ceph storage) In-Reply-To: <3B227E63-CA0E-4F88-8ED9-331008FF008D@cern.ch> References: <54148c1b-ce06-ae7b-1c08-0b5a6ceba4f3@gmail.com> <3B227E63-CA0E-4F88-8ED9-331008FF008D@cern.ch> Message-ID: <6bceda83-eab9-3dd5-8d6b-446395735b36@gmail.com> Tim, This was kind of what I was picturing as the temporary work around for Cinder not supporting this yet. Ideally the user would be able to use Cinder to do all the volume management (create/delete/etc.), but right now the Ceph iSCSI CLI only shows the volumes it created.  This is one of the challenges we have to resolve in adding this support. None-the-less users could use a portion of their Ceph storage for boot-from-volume purposes via the Ceph iSCSI CLI until we add the support.  It would just require them to create the volume and set up the iSCSI target on the Ceph iSCSI gateway.  Then the directions you shared for use without Cinder could be used to use the iSCSI Gateway as the target. In the future it should be possible to add those volumes under Cinder Management once we have all the support in place and then the Ceph iSCSI CLI would not need to be used in the future. Jay On 5/8/2019 11:25 AM, Tim Bell wrote: > > I’m not sure you actually need full cinder support, see “Boot Without > Cinder” in > https://docs.openstack.org/ironic/latest/admin/boot-from-volume.html > > (Never tried it though ….) > > Tim > > *From: *Jason Anderson > *Date: *Wednesday, 8 May 2019 at 18:17 > *To: *"openstack-discuss at lists.openstack.org" > , > "jsbryant at electronicjungle.net" > *Subject: *Re: topic: ironic boot from cinder volume(the cinder volume > backend is ceph storage) > > Tim, Jay -- > > I looked in to this recently as it was a use-case some of our HPC > users wanted support for. I noticed that Ceph has the iSCSI gateway, > but my impression was that this wouldn't work without adding some sort > of new driver in Cinder. Is that not true? I thought that Cinder only > Ceph via RBD. I'd be happy to be proven wrong on this. > > Cheers, > > /Jason > > ------------------------------------------------------------------------ > > *From:*Jay Bryant > *Sent:* Wednesday, May 8, 2019 11:04 > *To:* openstack-discuss at lists.openstack.org > *Subject:* Re: topic: ironic boot from cinder volume(the cinder volume > backend is ceph storage) > > Tim, > > Good thought.  That would be an interim solution until we are able to > get the process automated. > > Jay > > On 5/8/2019 10:55 AM, Tim Bell wrote: > > Just brainstorming.... > > > > Would it be possible to set up a couple of VMs as iscsi LIO gateways > by hand while this feature is being developed and using that end point > to boot an Ironic node? You may also be on a late enough version of > Ceph to do it using http://docs.ceph.com/docs/mimic/rbd/iscsi-overview/. > > > > Not self-service but could work for a few cases.. > > > > Tim > > > > -----Original Message----- > > From: Jay Bryant > > Reply-To: "jsbryant at electronicjungle.net" > > > Date: Wednesday, 8 May 2019 at 17:46 > > To: "openstack-discuss at lists.openstack.org" > > > Subject: Re: topic: ironic boot from cinder volume(the cinder volume > backend is ceph storage) > > > >      This is going to require being able to export Ceph volumes via > iSCSI. > >      The Ironic team communicated the importance of this feature to the > >      Cinder team a few months ago. > > > >      We are working on getting this support in place soon but it > probably > >      will not be until the U release. > > > >      Thanks! > > > >      Jay > > > > > >      On 5/8/2019 6:48 AM, 陈杰 wrote: > >      > Nowdays , the opestack rocky release ironic , is support > ironic boot > >      > from cinder volume(the cinder volume backend is ceph > storage)? My goal > >      > is to achieve this. > >      > Who can tell me about this principle? > >      > looking forward to a reply > >      > thank you all. > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Wed May 8 17:00:37 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 9 May 2019 02:00:37 +0900 Subject: [openstack-ansible][monasca][zaqar][watcher][searchlight] Retirement of unused OpenStack Ansible roles In-Reply-To: References: <236ef912-21c5-4345-98ce-067499921af1@www.fastmail.com> Message-ID: Hi all, I would love to take care of the searchlight roles. Are there any specific requirements I need to keep in mind? Bests, On Thu, Apr 25, 2019 at 5:50 AM Mohammed Naser wrote: > Hi, > > These roles have been broken for over a year now, some are not even > integrated with the OpenStack Ansible integrated repository. > > I think it's safe to say that for the most part, they have no users or > consumers unless someone has integrated it downstream somewhere and > didn't push that back out. It is a lot of overhead to maintain roles, > we're a small team that has to manage a huge amount of roles and their > integration, while on paper, I'd love for someone to step in and help, > but no one has for over a year. > > If someone wants to step in and get those roles to catch up on all the > technical debt they've accumulated (because when we'd do fixes across > all roles, we would always leave them.. because they always failed > tests..) then we're one revert away from it. I have some thoughts on > how we can resolve this for the future, but they're much more long > term, but for now, the additional workload on our very short resourced > team is a lot. > > Thanks, > Mohammed > > On Wed, Apr 24, 2019 at 8:56 AM Guilherme Steinmüller > wrote: > > > > Hello Witek and Jean-Philippe. > > > > I will hold off the retirement process until the end of PTG. > > > > Just for your information, that's what we have until now > https://review.opendev.org/#/q/topic:retire-osa-unused-roles+(status:open+OR+status:merged) > . > > > > I just -w the monsca roles as they were the only roles someone > manifested interest. > > > > Regards > > > > On Wed, Apr 24, 2019 at 8:14 AM Jean-Philippe Evrard < > jean-philippe at evrard.me> wrote: > >> > >> I am not sure this follows our documented retirement process, and it > seems very early to do so for some roles. > >> I think we should discuss role retirement at the next PTG (if we want > to change that process). > >> > >> In the meantime, I encourage people from the > monasca/zaqar/watcher/searchlight community interested deploying with > openstack-ansible to step up and take over their respective role's > maintainance. > >> > >> Regards, > >> Jean-Philippe Evrard (evrardjp). > >> > > > -- > Mohammed Naser — vexxhost > ----------------------------------------------------- > D. 514-316-8872 > D. 800-910-1726 ext. 200 > E. mnaser at vexxhost.com > W. http://vexxhost.com > > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgoncalves at redhat.com Wed May 8 17:04:19 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Wed, 8 May 2019 19:04:19 +0200 Subject: OpenStack User Survey 2019 In-Reply-To: References: <5CC0732E.8020601@tipit.net> <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> Message-ID: Hi Allison and Jimmy, In today's Octavia IRC meeting [1], the team agreed on the following two questions we would like to see included in the survey: 1. Which OpenStack load balancing (Octavia) provider drivers would you like to see supported? 2. Which new features would you like to see supported in OpenStack load balancing (Octavia)? Please let us know if you have any questions. Thanks, Carlos [1] http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-05-08-16.00.html On Tue, May 7, 2019 at 10:51 PM Allison Price wrote: > > Hi Michael, > > I apologize that the Octavia project team has been unable to submit a question to date. Jimmy posted the User Survey update to the public mailing list to ensure we updated the entire community and that we caught any projects that had not submitted their questions. The User Survey is open all year, and the primary goal is passing operator feedback to the upstream community. > > If the Octavia team - or any OpenStack project team - has a question they would like added (limit of 2 per project), please let Jimmy or myself know. > > Thanks for reaching out, Michael. > > Cheers, > Allison > > > On May 7, 2019, at 3:39 PM, Michael Johnson wrote: > > > > Jimmy & Allison, > > > > As you probably remember from previous year's surveys, the Octavia > > team has been trying to get a question included in the survey for a > > while. > > I have included the response we got the last time we inquired about > > the survey below. We never received a follow up invitation. > > > > I think it would be in the best interest for the community if we > > follow our "Four Opens" ethos in the user survey process, specifically > > the "Open Community" statement, by soliciting survey questions from > > the project teams in an open forum such as the openstack-discuss > > mailing list. > > > > Michael > > > > ----- Last response e-mail ------ > > Jimmy McArthur > > > > Fri, Sep 7, 2018, 5:51 PM > > to Allison, me > > Hey Michael, > > > > The project-specific questions were added in 2017, so likely didn't > > include some new projects. While we asked all projects to participate > > initially, less than a dozen did. We will be sending an invitation for > > new/underrepresented projects in the coming weeks. Please stand by and > > know that we value your feedback and that of the community. > > > > Cheers! > > > > > > > >> On Sat, Apr 27, 2019 at 5:11 PM Allison Price wrote: > >> > >> Hi Michael, > >> > >> We reached out to all of the PTLs who had questions in the 2018 version of the survey to review and update their questions. If there is a project that was missed, we can add it and share anonymized results with the PTLs directly as well as the openstack-discsuss mailing list. > >> > >> If there is a question from the Octavia team, please let us know and we can add it for the 2019 survey. > >> > >> Cheers, > >> Allison > >> > >> > >> > >> On Apr 27, 2019, at 4:01 PM, Michael Johnson wrote: > >> > >> Jimmy, > >> > >> I am curious, how did you reach out the PTLs for project specific > >> questions? The Octavia team didn't receive any e-mail from you or > >> Allison on the topic. > >> > >> Michael > >> > >> > > From lbragstad at gmail.com Wed May 8 17:13:47 2019 From: lbragstad at gmail.com (Lance Bragstad) Date: Wed, 8 May 2019 12:13:47 -0500 Subject: [cinder][ops] Nested Quota Driver Use? In-Reply-To: References: <20190502003249.GA1432@sm-workstation> <20190507142046.GA3999@sm-workstation> Message-ID: On 5/7/19 3:22 PM, Jay Bryant wrote: > > On 5/7/2019 9:20 AM, Sean McGinnis wrote: >> On Fri, May 03, 2019 at 06:58:41PM +0000, Tim Bell wrote: >>> We're interested in the overall functionality but I think unified >>> limits is the place to invest and thus would not have any problem >>> deprecating this driver. >>> >>> We'd really welcome this being implemented across all the projects >>> in a consistent way. The sort of functionality proposed in >>> https://techblog.web.cern.ch/techblog/post/nested-quota-models/  >>> would need Nova/Cinder/Manila at miniumum for CERN to switch. >>> >>> So, no objections to deprecation  but strong support to converge on >>> unified limits. >>> >>> Tim >>> >> Thanks Tim, that helps. >> >> Since there wasn't any other feedback, and no one jumping up to say >> they are >> using it today, I have submitted https://review.opendev.org/657511 to >> deprecated the current quota driver so we don't have to try to >> refactor that >> functionality into whatever we need to do for the unified limits >> support. >> >> If anyone has any concerns about this plan, please feel free to raise >> them here >> or on that review. >> >> Thanks! >> Sean > > Sean, > > If I remember correctly, IBM had put some time into trying to fix the > nested quota driver back around the Kilo or Liberty release. I haven't > seen much activity since then. > > I am in support deprecating the driver and going to unified limits > given that that appears to be the general direction of OpenStack. If you happen to notice anyone else contributing to the cinder-specific implementation, feel free to have them reach out to us. If people are interested in developing and adopting unified limits, we're happy to get them up-to-speed on the current approach. > > Jay > > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From cdent+os at anticdent.org Wed May 8 17:15:47 2019 From: cdent+os at anticdent.org (Chris Dent) Date: Wed, 8 May 2019 10:15:47 -0700 (PDT) Subject: [placement][nova][ptg] Summary: Consumer Types In-Reply-To: <93df3b21-149c-d32b-54d0-614597d4d754@gmail.com> References: <1557135206.12068.1@smtp.office365.com> <93df3b21-149c-d32b-54d0-614597d4d754@gmail.com> Message-ID: On Wed, 8 May 2019, Matt Riedemann wrote: > Yup I agree with everything said from a nova perspective. Our public cloud > operators were just asking about leaked allocations and if there was tooling > to report and clean that kind of stuff up. I explained we have the > heal_allocations CLI but that's only going to create allocations for > *instances* and only if those instances aren't deleted, but we don't have > anything in nova that deals with detection and cleanup of leaked allocations, > sort of like what this tooling does [1] but I think is different. I continue to wish that we had (or could chose to make) functionality on the compute node, perhaps in response to a signal of some kind that said: performed a reset of inventory and allocations. So that in case of doubt we can use reality as the authoritative source of truth, not either of the nova or placement dbs. I'm not sure if that's feasible at this stage. I agree that healing allocations for instances that are known to exist is easy, but cleaning up allocations that got left behind is harder. It's simplified somewhat (from nova's perspective) in that there should only ever be one group of allocations (that is, a thing identified by a consumer uuid) for an instance. Right now, you can generate a list of known consumers of compute nodes by doing what you describe: traversing the allocations of each compute node provider. If we ever move to a state where the compute node doesn't provide resources (and thus will have no allocations) we won't be able to do that, and that's one of the reasons why I get resistant when we talk about moving VCPU to NUMA nodes in all cases. Which supports your assertion that maybe some day it would be nice to list allocations by type. Some day. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From alifshit at redhat.com Wed May 8 18:21:44 2019 From: alifshit at redhat.com (Artom Lifshitz) Date: Wed, 8 May 2019 14:21:44 -0400 Subject: [nova][CI] GPUs in the gate In-Reply-To: <20190508132709.xgq6nz3mqkfw3q5d@yuggoth.org> References: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> <20190508132709.xgq6nz3mqkfw3q5d@yuggoth.org> Message-ID: On Wed, May 8, 2019 at 9:30 AM Jeremy Stanley wrote: > Long shot, but since you just need the feature provided and not the > performance it usually implies, are there maybe any open source > emulators which provide the same instruction set for conformance > testing purposes? Something like that exists for network cards. It's called netdevsim [1], and it's been mentioned in the SRIOV live migration spec [2]. However to my knowledge nothing like that exists for GPUs. [1] https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.16-Networking [2] https://specs.openstack.org/openstack/nova-specs/specs/train/approved/libvirt-neutron-sriov-livemigration.html#testing From aspiers at suse.com Wed May 8 18:27:19 2019 From: aspiers at suse.com (Adam Spiers) Date: Wed, 8 May 2019 19:27:19 +0100 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> Message-ID: <20190508182719.6exbju2l3ohskwjt@pacific.linksys.moosehall> Morgan Fainberg wrote: >In general I would rather see trust be pushed forward. The cores are >explicitly trusted individuals within a team. I would encourage setting >this policy aside and revisit if it ever becomes an issue. I think this >policy, preemptive or not, highlights a lack of trust. IMHO it wouldn't highlight a lack of trust if it explicitly said that there is no current problem in the community. But it's not just about trust. There's also the issue of simple honest lack of awareness, even by diligent newbie cores with the very finest of intentions. Honestly, if I hadn't stumbled across this conversation at the PTG, and later became core on a project, it might have never crossed my mind that it might be better in some scenarios to avoid giving W+1 on a review where +2 was only given by colleagues at my company. Indeed, the fact that we currently (and hopefully indefinitely) enjoy the ability to trust the best interests of others cores would probably make me *more* susceptible to accidentally introducing company-oriented bias without realising it. In contrast, if there was an on-boarding document for new cores which raised awareness of this, I would read that when becoming a core, and then vet myself for employer-oriented bias before every +2 and W+1 I subsequently gave. >It is another reason >why Keystone team abolished the rule. AI.kuch prefer trusting the cores >with landing code with or without external/additional input as they feel is >appropriate. > >There are remedies if something lands inappropriately (revert, removal of >core status, etc). As stated earlier in the quoted email, this has never or >almost never been an issue. > >With that said, I don't have a strongly vested interest outside of seeing >our community succeeding and being as open/inclusive as possible (since >most contributions I am working on are not subject to this policy). As long >as the policy isn't strictly tribal knowledge, we are headed in the right >direction. Agreed. Any suggestions on how to prevent it being tribal? The only way I can think of is documenting it, but maybe I'm missing a trick. From zbitter at redhat.com Wed May 8 18:27:54 2019 From: zbitter at redhat.com (Zane Bitter) Date: Wed, 8 May 2019 14:27:54 -0400 Subject: [tc] Proposal: restrict TC activities In-Reply-To: <20190504132550.GA28713@shipstone.jp> References: <20190503204942.GB28010@shipstone.jp> <20190504132550.GA28713@shipstone.jp> Message-ID: <630df54a-3645-6319-da88-58f47ae36ca5@redhat.com> On 4/05/19 9:25 AM, Emmet Hikory wrote: > Zhipeng Huang wrote: >> Then it might fit the purpose to rename the technical committee to >> governance committee or other terms. If we have a technical committee not >> investing time to lead in technical evolvement of OpenStack, it just seems >> odd to me. > > OpenStack has a rich governance structure, including at least the > Foundation Board, the User Committee, and the Technical Committee. Within > the context of governance, the Technical Committee is responsible for both > technical governance of OpenStack and governance of the technical community. > It is within that context that "Technical Committee" is the name. > > I also agree that it is important that members of the Technical Committee > are able to invest time to lead in the technical evolution of OpenStack, and > this is a significant reason that I propose that the activities of the TC be > restricted, precisely so that being elected does not mean that one no longer > is able to invest time for this. Could you be more clear about which activities you think should be restricted? Presumably you're arguing that there should be fewer... let's call it "ex officio" responsibilities to being a TC member. The suggestion reads as kind of circular, because you appear to be saying that aspiring TC members should be doing certain kinds of socially useful tasks that are likely to get them elected to the TC, where they will be restricted from doing those tasks in order to make sure they have free time to do the kinds of socially useful things they were doing prior to getting elected to the TC, except that those are now restricted for them. Presumably we're actually talking about different sets of tasks there, but I don't think we can break the loop without being explicit about what they are. >> TC should be a place good developers aspired to, not retired to. BTW this >> is not a OpenStack-only issue but I see across multiple open source >> communities. > > While I agree that it is valuable to have a target for the aspirations > of good developers, I am not convinced that OpenStack can be healthy if we > restrict our aspirations to nine seats. Good news, we have 13 seats ;) > From my perspective, this causes > enough competition that many excellent folk may never be elected, and that > some who wish to see their aspirations fufilled may focus activity in other > communities where it may be easier to achieve an arbitrary title. > > Instead, I suggest that developers should aspire to be leaders in the > OpenStack comunuity, and be actively involved in determining the future > technical direction of OpenStack. I just don't think there needs to be > any correlation between this and the mechanics of reviewing changes to the > governance repository. I couldn't agree more that we want as many people as possible to be leaders in the community and not wait to be elected to something. That said, in my personal experience, people just... listen more (for better and worse) to you when you're a TC member, because the election provides social proof that other people are listening to you too. This phenomenon seems unavoidable unless you create separate bodies for technical direction and governance (which I suspect has its own problems, like a tendency for the governance body to become dominated by professional managers). cheers, Zane. From morgan.fainberg at gmail.com Wed May 8 18:50:32 2019 From: morgan.fainberg at gmail.com (Morgan Fainberg) Date: Wed, 8 May 2019 11:50:32 -0700 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: <20190508182719.6exbju2l3ohskwjt@pacific.linksys.moosehall> References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> <20190508182719.6exbju2l3ohskwjt@pacific.linksys.moosehall> Message-ID: On Wed, May 8, 2019 at 11:27 AM Adam Spiers wrote: > Morgan Fainberg wrote: > >In general I would rather see trust be pushed forward. The cores are > >explicitly trusted individuals within a team. I would encourage setting > >this policy aside and revisit if it ever becomes an issue. I think this > >policy, preemptive or not, highlights a lack of trust. > > IMHO it wouldn't highlight a lack of trust if it explicitly said that > there is no current problem in the community. > > But it's not just about trust. There's also the issue of simple > honest lack of awareness, even by diligent newbie cores with the very > finest of intentions. > > Honestly, if I hadn't stumbled across this conversation at the PTG, > and later became core on a project, it might have never crossed my > mind that it might be better in some scenarios to avoid giving W+1 on > a review where +2 was only given by colleagues at my company. Indeed, > the fact that we currently (and hopefully indefinitely) enjoy the > ability to trust the best interests of others cores would probably > make me *more* susceptible to accidentally introducing > company-oriented bias without realising it. > > In contrast, if there was an on-boarding document for new cores which > raised awareness of this, I would read that when becoming a core, and > then vet myself for employer-oriented bias before every +2 and W+1 I > subsequently gave. > > >It is another reason > >why Keystone team abolished the rule. AI.kuch prefer trusting the cores > >with landing code with or without external/additional input as they feel > is > >appropriate. > > > >There are remedies if something lands inappropriately (revert, removal of > >core status, etc). As stated earlier in the quoted email, this has never > or > >almost never been an issue. > > > >With that said, I don't have a strongly vested interest outside of seeing > >our community succeeding and being as open/inclusive as possible (since > >most contributions I am working on are not subject to this policy). As > long > >as the policy isn't strictly tribal knowledge, we are headed in the right > >direction. > > Agreed. Any suggestions on how to prevent it being tribal? The only > way I can think of is documenting it, but maybe I'm missing a trick. > Unfortunately, in this case it's "tribal" or "documented". No "one weird trick" here as far as I know ;). --Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Wed May 8 19:19:01 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Wed, 8 May 2019 19:19:01 +0000 Subject: [tc] Proposal: restrict TC activities In-Reply-To: <630df54a-3645-6319-da88-58f47ae36ca5@redhat.com> References: <20190503204942.GB28010@shipstone.jp> <20190504132550.GA28713@shipstone.jp> <630df54a-3645-6319-da88-58f47ae36ca5@redhat.com> Message-ID: <20190508191900.aronojaifbnh26yi@yuggoth.org> On 2019-05-08 14:27:54 -0400 (-0400), Zane Bitter wrote: > On 4/05/19 9:25 AM, Emmet Hikory wrote: > > Zhipeng Huang wrote: > > > Then it might fit the purpose to rename the technical > > > committee to governance committee or other terms. If we have a > > > technical committee not investing time to lead in technical > > > evolvement of OpenStack, it just seems odd to me. > > > > OpenStack has a rich governance structure, including at least > > the Foundation Board, the User Committee, and the Technical > > Committee. Within the context of governance, the Technical > > Committee is responsible for both technical governance of > > OpenStack and governance of the technical community. It is > > within that context that "Technical Committee" is the name. > > > > I also agree that it is important that members of the Technical > > Committee are able to invest time to lead in the technical > > evolution of OpenStack, and this is a significant reason that I > > propose that the activities of the TC be restricted, precisely > > so that being elected does not mean that one no longer is able > > to invest time for this. > > Could you be more clear about which activities you think should be > restricted? Presumably you're arguing that there should be > fewer... let's call it "ex officio" responsibilities to being a TC > member. > > The suggestion reads as kind of circular, because you appear to be > saying that aspiring TC members should be doing certain kinds of > socially useful tasks that are likely to get them elected to the > TC, where they will be restricted from doing those tasks in order > to make sure they have free time to do the kinds of socially > useful things they were doing prior to getting elected to the TC, > except that those are now restricted for them. Presumably we're > actually talking about different sets of tasks there, but I don't > think we can break the loop without being explicit about what they > are. [...] My read was that the community should, each time we assert there's something we want done and we think the TC should also take care of for us, step back and consider that those TC members are already deeply embedded in various parts of our community as well as adjacent communities getting other things done (likely the same things which got them elected to seats on the TC to begin with), and that each new thing we want them to tackle is going to take the place of yet more of those other things they'll cease having time for as a result. Taken from another perspective, it's the idea that the TC as a governing body should limit its focus to governance tasks and stop feeling pressured to find yet more initiatives and responsibilities for itself, leaving more time for the folks serving on the TC to also continue doing all manner of other important tasks they feel compelled to do in their capacity as members of the community rather than with their "TC hats" on. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From pawel.konczalski at everyware.ch Wed May 8 19:27:32 2019 From: pawel.konczalski at everyware.ch (Pawel Konczalski) Date: Wed, 8 May 2019 21:27:32 +0200 Subject: Magnum Kubernetes openstack-cloud-controller-manager unable not resolve master node by DNS In-Reply-To: <4FFA2395-960B-4DA7-8481-F2AD93EAB500@stackhpc.com> References: <4FFA2395-960B-4DA7-8481-F2AD93EAB500@stackhpc.com> Message-ID: <8c93a364-030c-d0d2-447b-3e737641d24a@everyware.ch> Hi Bharat, i was able to deploy the Kubernetes cluster with Magnum after update / specify Kubernetes version with the "--labels kube_tag=v1.13.4" parameter. See: # kube_tag https://docs.openstack.org/magnum/latest/user/#kube-tag https://hub.docker.com/r/openstackmagnum/kubernetes-apiserver/tags/ # cloud_provider_tag https://docs.openstack.org/magnum/latest/user/#cloud-provider-tag https://hub.docker.com/r/k8scloudprovider/openstack-cloud-controller-manager/tags/ This may by related with this issue: https://github.com/kubernetes/cloud-provider-openstack/issues/280 # openstack coe cluster template create kubernetes-cluster-template \   --image "Fedora AtomicHost 29" \   --external-network public \   --dns-nameserver 8.8.8.8 \   --master-flavor m1.kubernetes \   --flavor m1.kubernetes \   --coe kubernetes \   --volume-driver cinder \   --network-driver flannel \   --docker-volume-size 25 \   --public \   --labels kube_tag=v1.13.4,cloud_provider_tag=1.13.1 # openstack coe cluster create kubernetes-cluster \   --cluster-template kubernetes-cluster-template \   --master-count 1 \   --node-count 2 \   --keypair mykey # kubectl get pods --all-namespaces -o wide NAMESPACE     NAME                                       READY STATUS    RESTARTS   AGE     IP NODE                                        NOMINATED NODE READINESS GATES kube-system   coredns-dcc6d487d-hxpgq                    1/1 Running   0          7h55m   10.100.9.2 kubernetes-cluster7-sysemevhbq4i-minion-1   kube-system   coredns-dcc6d487d-nkb9p                    1/1 Running   0          7h57m   10.100.78.4 kubernetes-cluster7-sysemevhbq4i-minion-0   kube-system   heapster-796547984d-6wgwp                  1/1 Running   0          7h57m   10.100.78.2 kubernetes-cluster7-sysemevhbq4i-minion-0   kube-system   kube-dns-autoscaler-7865df57cd-ln4cc       1/1 Running   0          7h57m   10.100.78.3 kubernetes-cluster7-sysemevhbq4i-minion-0   kube-system   kubernetes-dashboard-f5496d66d-tdbvv       1/1 Running   0          7h57m   10.100.78.5 kubernetes-cluster7-sysemevhbq4i-minion-0   kube-system   openstack-cloud-controller-manager-9s5wh   1/1 Running   3          7h57m   10.0.0.10 kubernetes-cluster7-sysemevhbq4i-master-0   Thank you Pawel Am 08.05.19 um 8:40 vorm. schrieb Bharat Kunwar: > Try using the latest version, think there is an OCCM_TAG. > > Sent from my iPhone > >> On 7 May 2019, at 20:10, Pawel Konczalski wrote: >> >> Hi, >> >> i try to deploy a Kubernetes cluster with OpenStack Magnum but the openstack-cloud-controller-manager pod fails to resolve the master node hostname. >> >> Does magnum require further parameter to configure the DNS names of the master and minions? DNS resolution in the VMs works fine. Currently there is no Designate installed in the OpenStack setup. >> >> >> openstack coe cluster template create kubernetes-cluster-template1 \ >> --image Fedora-AtomicHost-29-20190429.0.x86_64 \ >> --external-network public \ >> --dns-nameserver 8.8.8.8 \ >> --master-flavor m1.kubernetes \ >> --flavor m1.kubernetes \ >> --coe kubernetes \ >> --volume-driver cinder \ >> --network-driver flannel \ >> --docker-volume-size 25 >> >> openstack coe cluster create kubernetes-cluster1 \ >> --cluster-template kubernetes-cluster-template1 \ >> --master-count 1 \ >> --node-count 2 \ >> --keypair mykey >> >> >> # kubectl get pods --all-namespaces -o wide >> NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE >> kube-system coredns-78df4bf8ff-mjp2c 0/1 Pending 0 36m >> kube-system heapster-74f98f6489-tgtzl 0/1 Pending 0 36m >> kube-system kube-dns-autoscaler-986c49747-wrvz4 0/1 Pending 0 36m >> kube-system kubernetes-dashboard-54cb7b5997-sk5pj 0/1 Pending 0 36m >> kube-system openstack-cloud-controller-manager-dgk64 0/1 CrashLoopBackOff 11 36m kubernetes-cluster1-vulg5fz6hg2n-master-0 >> >> >> # kubectl -n kube-system logs openstack-cloud-controller-manager-dgk64 >> Error from server: Get https://kubernetes-cluster1-vulg5fz6hg2n-master-0:10250/containerLogs/kube-system/openstack-cloud-controller-manager-dgk64/openstack-cloud-controller-manager: dial tcp: lookup kubernetes-cluster1-vulg5fz6hg2n-master-0 on 8.8.8.8:53: no such host >> >> >> BR >> >> Pawel -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 5227 bytes Desc: not available URL: From openstack at nemebean.com Wed May 8 19:42:00 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 8 May 2019 14:42:00 -0500 Subject: [oslo] PTG Summary Message-ID: Hi, You can find the raw notes on the etherpad (https://etherpad.openstack.org/p/oslo-train-topics), but hopefully this will be an easier to read/understand summary. Pluggable Policy ---------------- Spec: https://review.opendev.org/#/c/578719/ Since this sort of ran out of steam last cycle, we discussed the option of not actually making it pluggable and just explicitly adding support for other policy backends. The specific one that seems to be of interest is Open Policy Agent. To do this we would add an option to enable OPA mode, where all policy checks would be passed through to OPA by default. An OPACheck class would also be added to facilitate migration (as a rule is added to OPA, switch the policy to OPACheck. Once all rules are present, remove the policy file and just turn on the OPA mode). However, after some further investigation by Patrick East, it was not clear if users were asking for this or if the original spec was more of a "this might be useful" thing. He's following up with some OPA users to see if they would use such a feature, but at this point it's not clear whether there is enough demand to justify spending time on it. Image Encryption/Decryption Library ----------------------------------- I mention this mostly because the current plan is _not_ to create a new Oslo library to enable the feature. The common code between services is expected to live in os-brick, and there does not appear to be a need to create a new encryption library to support this (yay!). oslo.service SIGHUP bug ----------------------- This is a problem a number of people have run into recently and there's been some ongoing, but spotty, discussion of how to deal with it. In Denver we were able to have some face-to-face discussions and hammer out a plan to get this fixed. I think we have a fix identified, and now we just need to get it proposed and tested so we don't regress this in the future. Most of the prior discussion and a previously proposed fix are at https://review.opendev.org/#/c/641907/ so if you want to follow this that's the place to do it. In case anyone is interested, it looks like this is a bug that was introduced with mutable config. Mutable config requires a different type of service restart, and that was never implemented. Now that most services are using mutable config, this is much bigger problem. Unified Limits and Policy ------------------------- I won't try to cover everything in detail here, but good progress was made on both of these topics. There isn't much to do from the Oslo side for the policy changes, but we identified a plan for an initial implementation of oslo.limit. There was general agreement that we don't necessarily have to get it 100% right on the first attempt, we just need to get something in the repo that people can start prototyping with. Until we release a 1.0 we aren't committed to any API, so we have flexibility to iterate. For more details, see: https://etherpad.openstack.org/p/ptg-train-xproj-nova-keystone oslo.service profiling and pypy ------------------------------- Oslo has dropped support for pypy in general due to lack of maintainers, so although the profiling work has apparently broken oslo.service under pypy this isn't something we're likely to address. Based on our conversation at the PTG game night, it sounds like this isn't a priority anymore anyway because pypy didn't have the desired performance improvement. oslo.privsep eventlet timeout ----------------------------- AFAICT, oslo.privsep only uses eventlet at all if monkey-patching is enabled (and then only to make sure it returns the right type of pipe for the environment). It's doubtful any eventlet exceptions are being raised from the privsep code, and even if they are they would go away once monkey-patching in the calling service is disabled. Privsep is explicitly not depending on eventlet for any of its functionality so services should be able to freely move away from eventlet if they wish. Retrospective ------------- In general, we got some major features implemented that unblocked things either users or services were asking for. We did add two cores during the cycle, but we also lost a long-time Oslo core and some of the other cores are being pulled away on other projects. So far this has probably resulted in a net loss in review capacity. As a result, our primary actions out of this were to continue watching for new candidates to join the Oslo team. We have at least one person we are working closely with and a number of other people approached me at the event with interest in contributing to one or more Oslo projects. So while this cycle was a bit of a mixed bag, I have a cautiously optimistic view of the future. Service Healthchecks and Metrics -------------------------------- Had some initial hallway track discussions about this. The self-healing SIG is looking into ways to improve the healthcheck and metric situation in OpenStack, and some of them may require additions or changes in Oslo. There is quite a bit of discussion (not all of which I have read yet) related to this on https://review.opendev.org/#/c/653707/ On the metrics side, there are some notes on the SIG etherpad (currently around line 209): https://etherpad.openstack.org/p/DEN-self-healing-SIG It's still a bit early days for both of these things so plans may change, but it seems likely that Oslo will be involved to some extent. Stay tuned. Endgame ------- No spoilers, I promise. If you made it all the way here then thanks and congrats. :-) I hope this was helpful, and if you have any thoughts about anything above please let me know. Thanks. -Ben From sundar.nadathur at intel.com Wed May 8 19:53:27 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Wed, 8 May 2019 19:53:27 +0000 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> References: <1CC272501B5BC543A05DB90AA509DED527552AD6@fmsmsx122.amr.corp.intel.com> <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> Message-ID: <1CC272501B5BC543A05DB90AA509DED527557F03@fmsmsx122.amr.corp.intel.com> Thanks, Eric and Chris. Can this scheme address this use case? I have a set of compute hosts, each with several NICs of type T. Each NIC has a set of PFs: PF1, PF2, .... Each PF is a resource provider, and each has a separate custom RC: CUSTOM_RC_PF1, CUSTOM_RC_PF2, ... . The VFs are inventories of the associated PF's RC. Provider networks etc. are traits on that PF. The use case is to schedule a VM with several Neutron ports coming from the same NIC card and tied to specific networks. Let us say we (somehow) translate this to a set of request groups like this: resources_T1:CUSTOM_RC_PF1 = 2 # Note: T is the NIC name, and we are asking for VFs as resources. traits_T1:CUSTOM_TRAIT_MYNET1 = required resources_T2:CUSTOM_RC_PF2 = 1 traits_T2:CUSTOM_TRAIT_MYNET2 = required "same_subtree=%s" % ','.join(suffix for suffix in all_suffixes if suffix.startswith('T')) Will this ensure that all allocations come from the same NIC card? Do I have to create a 'resourceless RP' for the NIC card that contains the individual PF RPs as children nodes? P.S.: Ignore the comments I added to https://storyboard.openstack.org/#!/story/2005575#comment-122255. Regards, Sundar > -----Original Message----- > From: Eric Fried > Sent: Saturday, May 4, 2019 3:57 PM > To: openstack-discuss at lists.openstack.org > Subject: Re: [placement][nova][ptg] resource provider affinity > > For those of you following along at home, we had a design session a couple of > hours ago and hammered out the broad strokes of this work, including rough > prioritization of the various pieces. Chris has updated the story [1] with a > couple of notes; expect details and specs to emerge therefrom. > > efried > > [1] https://storyboard.openstack.org/#!/story/2005575 From openstack at nemebean.com Wed May 8 20:04:21 2019 From: openstack at nemebean.com (Ben Nemec) Date: Wed, 8 May 2019 15:04:21 -0500 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: <20190508143923.bhmla62qi2p7yc7s@yuggoth.org> <20190508154511.njvidentht4d4zim@pacific.linksys.moosehall> <78C13304-E630-43FD-BDA5-0C43FBDA8B29@leafe.com> <20190508182719.6exbju2l3ohskwjt@pacific.linksys.moosehall> Message-ID: <4bb90a56-9e01-8935-9d4a-51fb5a61145d@nemebean.com> On 5/8/19 1:50 PM, Morgan Fainberg wrote: > > > On Wed, May 8, 2019 at 11:27 AM Adam Spiers > wrote: > > Morgan Fainberg > wrote: > >In general I would rather see trust be pushed forward. The cores are > >explicitly trusted individuals within a team. I would encourage > setting > >this policy aside and revisit if it ever becomes an issue. I think > this > >policy, preemptive or not, highlights a lack of trust. > > IMHO it wouldn't highlight a lack of trust if it explicitly said that > there is no current problem in the community. > > But it's not just about trust.  There's also the issue of simple > honest lack of awareness, even by diligent newbie cores with the very > finest of intentions. > > Honestly, if I hadn't stumbled across this conversation at the PTG, > and later became core on a project, it might have never crossed my > mind that it might be better in some scenarios to avoid giving W+1 on > a review where +2 was only given by colleagues at my company.  Indeed, > the fact that we currently (and hopefully indefinitely) enjoy the > ability to trust the best interests of others cores would probably > make me *more* susceptible to accidentally introducing > company-oriented bias without realising it. > > In contrast, if there was an on-boarding document for new cores which > raised awareness of this, I would read that when becoming a core, and > then vet myself for employer-oriented bias before every +2 and W+1 I > subsequently gave. > > >It is another reason > >why Keystone team abolished the rule.  AI.kuch prefer trusting the > cores > >with landing code with or without external/additional input as > they feel is > >appropriate. > > > >There are remedies if something lands inappropriately (revert, > removal of > >core status, etc). As stated earlier in the quoted email, this has > never or > >almost never been an issue. > > > >With that said, I don't have a strongly vested interest outside of > seeing > >our community succeeding and being as open/inclusive as possible > (since > >most contributions I am working on are not subject to this > policy). As long > >as the policy isn't strictly tribal knowledge, we are headed in > the right > >direction. > > Agreed.  Any suggestions on how to prevent it being tribal?  The only > way I can think of is documenting it, but maybe I'm missing a trick. > > > Unfortunately, in this case it's "tribal" or "documented". No "one weird > trick" here as far as I know ;). Two cores, one company. You won't believe what happens next! /me goes back to daydreaming about working on a project with enough contributors for this to be a problem :-) From marcin.juszkiewicz at linaro.org Wed May 8 20:17:19 2019 From: marcin.juszkiewicz at linaro.org (Marcin Juszkiewicz) Date: Wed, 8 May 2019 22:17:19 +0200 Subject: [cinder] Python3 requirements for Train In-Reply-To: References: Message-ID: <1fa6b516-ad4e-a4dc-cac1-6b72e9b1846b@linaro.org> W dniu 08.05.2019 o 17:04, Walter Boring pisze: > The train release is going to be the last release of OpenStack with > python 2 support. Train also is going to require supporting python > 3.6 and 3.7. This means that we should be enabling and or switching > over all of our 3rd party CI runs to python 3 to ensure that our > drivers and all of their required libraries run properly in a python > 3.6/3.7 environment. This will help driver maintainers discover any > python3 incompatibilities with their driver as well as any required > libraries. At the PTG in Denver, the cinder team agreed that we > wanted driver CI systems to start using python3 by milestone 2 for > Train. This would be the July 22-26th time frame [1]. Added cinder to a list of 'things may break' projects then. I am working on switching Kolla to use only Python 3 in Debian/Ubuntu based images. Stopped counting projects I had to patch ;( From mriedemos at gmail.com Wed May 8 20:41:25 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 8 May 2019 15:41:25 -0500 Subject: [nova][ptg] Summary: Implicit trait-based filters In-Reply-To: References: Message-ID: <3dadf184-e889-0975-0d55-cc6066a122a8@gmail.com> On 5/6/2019 1:44 PM, Eric Fried wrote: > Addendum: > There's another implicit trait-based filter that bears mentioning: > Excluding disabled compute hosts. > > We have code that disables a compute service when "something goes wrong" > in various ways. This code should decorate the compute node's resource > provider with a COMPUTE_SERVICE_DISABLED trait, and every GET > /allocation_candidates request should include > ?required=!COMPUTE_SERVICE_DISABLED, so that we don't retrieve > allocation candidates for disabled hosts. > > mriedem has started to prototype the code for this [1]. > > Action: Spec to be written. Code to be polished up. Possibly aspiers to > be involved in this bit as well. > > efried > > [1]https://review.opendev.org/#/c/654596/ Here is the spec [1]. There are noted TODOs and quite a few alternatives listed, mostly alternatives to the proposed design and what's in my PoC. One thing my PoC didn't cover was the service group API and it automatically reporting a service as up or down, I think that will have to be incorp0rated into this, but how best to do that without having this 'disabled' trait management everywhere might be tricky. My PoC tries to make the compute the single place we manage the trait, but that's also problematic if we lose a race with the API to disable a compute before the compute dies, or if MQ drops the call, etc. We might need/want to hook into the update_available_resource periodic to heal / sync the trait if we have an issue like that, or on startup during upgrade, and we likely also need a CLI to sync the trait status manually - at least to aid with the upgrade. Who knew that managing a status reporting daemon could be complicated (oh right everyone). [1] https://review.opendev.org/#/c/657884/ -- Thanks, Matt From joseph.davis at suse.com Wed May 8 21:23:22 2019 From: joseph.davis at suse.com (Joseph Davis) Date: Wed, 8 May 2019 14:23:22 -0700 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: Message-ID: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> On 5/8/19 7:12 AM, openstack-discuss-request at lists.openstack.org wrote: > Hello Trinh, > Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in > the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I > would like to discuss and understand a bit better the context behind > the Telemetry > events deprecation. Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported.  In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people.  If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train.  Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: > >> Hi team, >> >> As planned, we will have a team meeting at 02:00 UTC, May 9th on >> #openstack-telemetry to discuss what we gonna do for the next milestone >> (Train-1) and continue what we left off from the last meeting. >> >> I put here [1] the agenda thinking that it should be fine for an hour >> meeting. If you have anything to talk about, please put it there too. >> >> [1]https://etherpad.openstack.org/p/telemetry-meeting-agenda >> >> >> Bests, >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz* >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From allison at openstack.org Wed May 8 21:30:31 2019 From: allison at openstack.org (Allison Price) Date: Wed, 8 May 2019 16:30:31 -0500 Subject: OpenStack User Survey 2019 In-Reply-To: References: <5CC0732E.8020601@tipit.net> <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> Message-ID: Hi Carlos, Thank you for providing these two questions. We can get them both added, but I did have a question. Are both of these questions intended to be open ended with a text box for respondents to fill in their answers? Or do you want to provide answer choices? (thinking for the first question in particular) With any multiple choice question, an Other option can be included that will trigger a text box to be completed. Thanks! Allison > On May 8, 2019, at 12:04 PM, Carlos Goncalves wrote: > > Hi Allison and Jimmy, > > In today's Octavia IRC meeting [1], the team agreed on the following > two questions we would like to see included in the survey: > > 1. Which OpenStack load balancing (Octavia) provider drivers would you > like to see supported? > 2. Which new features would you like to see supported in OpenStack > load balancing (Octavia)? > > Please let us know if you have any questions. > > Thanks, > Carlos > > [1] http://eavesdrop.openstack.org/meetings/octavia/2019/octavia.2019-05-08-16.00.html > > > On Tue, May 7, 2019 at 10:51 PM Allison Price wrote: >> >> Hi Michael, >> >> I apologize that the Octavia project team has been unable to submit a question to date. Jimmy posted the User Survey update to the public mailing list to ensure we updated the entire community and that we caught any projects that had not submitted their questions. The User Survey is open all year, and the primary goal is passing operator feedback to the upstream community. >> >> If the Octavia team - or any OpenStack project team - has a question they would like added (limit of 2 per project), please let Jimmy or myself know. >> >> Thanks for reaching out, Michael. >> >> Cheers, >> Allison >> >>> On May 7, 2019, at 3:39 PM, Michael Johnson wrote: >>> >>> Jimmy & Allison, >>> >>> As you probably remember from previous year's surveys, the Octavia >>> team has been trying to get a question included in the survey for a >>> while. >>> I have included the response we got the last time we inquired about >>> the survey below. We never received a follow up invitation. >>> >>> I think it would be in the best interest for the community if we >>> follow our "Four Opens" ethos in the user survey process, specifically >>> the "Open Community" statement, by soliciting survey questions from >>> the project teams in an open forum such as the openstack-discuss >>> mailing list. >>> >>> Michael >>> >>> ----- Last response e-mail ------ >>> Jimmy McArthur >>> >>> Fri, Sep 7, 2018, 5:51 PM >>> to Allison, me >>> Hey Michael, >>> >>> The project-specific questions were added in 2017, so likely didn't >>> include some new projects. While we asked all projects to participate >>> initially, less than a dozen did. We will be sending an invitation for >>> new/underrepresented projects in the coming weeks. Please stand by and >>> know that we value your feedback and that of the community. >>> >>> Cheers! >>> >>> >>> >>>> On Sat, Apr 27, 2019 at 5:11 PM Allison Price wrote: >>>> >>>> Hi Michael, >>>> >>>> We reached out to all of the PTLs who had questions in the 2018 version of the survey to review and update their questions. If there is a project that was missed, we can add it and share anonymized results with the PTLs directly as well as the openstack-discsuss mailing list. >>>> >>>> If there is a question from the Octavia team, please let us know and we can add it for the 2019 survey. >>>> >>>> Cheers, >>>> Allison >>>> >>>> >>>> >>>> On Apr 27, 2019, at 4:01 PM, Michael Johnson wrote: >>>> >>>> Jimmy, >>>> >>>> I am curious, how did you reach out the PTLs for project specific >>>> questions? The Octavia team didn't receive any e-mail from you or >>>> Allison on the topic. >>>> >>>> Michael >>>> >>>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at fried.cc Wed May 8 21:31:20 2019 From: openstack at fried.cc (Eric Fried) Date: Wed, 8 May 2019 16:31:20 -0500 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <1CC272501B5BC543A05DB90AA509DED527557F03@fmsmsx122.amr.corp.intel.com> References: <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> <1CC272501B5BC543A05DB90AA509DED527557F03@fmsmsx122.amr.corp.intel.com> Message-ID: <1934f31d-da89-071f-d667-c36d965851ae@fried.cc> Sundar- > I have a set of compute hosts, each with several NICs of type T. Each NIC has a set of PFs: PF1, PF2, .... Each PF is a resource provider, and each has a separate custom RC: CUSTOM_RC_PF1, CUSTOM_RC_PF2, ... . The VFs are inventories of the associated PF's RC. Provider networks etc. are traits on that PF. It would be weird for the inventories to be called PF* if they're inventories of VF. But mainly: why the custom resource classes? The way "resourceless RP" + "same_subtree" is designed to work is best explained if I model your use case with standard resource classes instead: CN | +---NIC1 (trait: I_AM_A_NIC) | | | +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) | | | +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) | +---NIC2 (trait: I_AM_A_NIC) | +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) | +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) Now if I say: ?resources_T1=VF:1 &required_T1=CUSTOM_PHYSNET1 &resources_T2=VF:1 &required_T2=CUSTOM_PHYSNET2 &required_T3=I_AM_A_NIC &same_subtree=','.join([suffix for suffix in suffixes if suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') ...then I'll get two candidates: - {PF1_1: VF=1, PF1_2: VF=1} <== i.e. both from NIC1 - {PF2_1: VF=1, PF2_2: VF=1} <== i.e. both from NIC2 ...and no candidates where one VF is from each NIC. IIUC this is how you wanted it. ============== With the custom resource classes, I'm having a hard time understanding the model. How unique are the _PF$N bits? Do they repeat (a) from one NIC to the next? (b) From one host to the next? (c) Never? The only thing that begins to make sense is (a), because (b) and (c) would lead to skittles. So assuming (a), the model would look something like: CN | +---NIC1 (trait: I_AM_A_NIC) | | | +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4) | | | +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4) | +---NIC2 (trait: I_AM_A_NIC) | +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4) | +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4) Now you could get the same result with (essentially) the same request as above: ?resources_T1=CUSTOM_PF1_VF:1 &required_T1=CUSTOM_PHYSNET1 &resources_T2=CUSTOM_PF2_VF:1 &required_T2=CUSTOM_PHYSNET2 &required_T3=I_AM_A_NIC &same_subtree=','.join([suffix for suffix in suffixes if suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') ==> - {PF1_1: CUSTOM_PF1_VF=1, PF1_2: CUSTOM_PF2_VF=1} - {PF2_1: CUSTOM_PF1_VF=1, PF2_2: CUSTOM_PF2_VF=1} ...except that in this model, PF$N corresponds to PHYSNET$N, so you wouldn't actually need the required_T$N=CUSTOM_PHYSNET$N to get the same result: ?resources_T1=CUSTOM_PF1_VF:1 &resources_T2=CUSTOM_PF2_VF:1 &required_T3=I_AM_A_NIC &same_subtree=','.join([suffix for suffix in suffixes if suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') ...because you're effectively encoding the physnet into the RC. Which is not good IMO. But either way... > Do I have to create a 'resourceless RP' for the NIC card that contains the individual PF RPs as children nodes? ...if you want to be able to request this kind of affinity, then yes, you do (unless there's some consumable resource on the NIC, in which case it's not resourceless, but the spirit is the same). This is exactly what these features are being designed for. Thanks, efried . From gouthampravi at gmail.com Wed May 8 23:21:41 2019 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Wed, 8 May 2019 16:21:41 -0700 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: On Tue, May 7, 2019 at 1:08 PM Jay Bryant wrote: > > All, > > Cinder has been working with the same unwritten rules for quite some time as well with minimal issues. > > I think the concerns about not having it documented are warranted. We have had question about it in the past with no documentation to point to. It is more or less lore that has been passed down over the releases. :-) > > At a minimum, having this e-mail thread is helpful. If, however, we decide to document it I think we should have it consistent across the teams that use the rule. I would be happy to help draft/review any such documentation. Chiming in to say the manila community adopted a review policy during Stein release - most of the review policy was what we followed prior, without explicitly writing them down: https://docs.openstack.org/manila/latest/contributor/manila-review-policy.html. Here's a snip/snap from that policy that is relevant to this discussion: Previously, the manila core team informally enforced a code review convention that each code change be reviewed and merged by reviewers of different affiliations. This was followed because the OpenStack Technical Committee used the diversity of affiliation of the core reviewer team as a metric for maturity of the project. However, since the Rocky release cycle, the TC has changed its view on the subject 3 4. We believe this is a step in the right direction. While there is no strict requirement that two core reviewers accepting a code change have different affiliations. Other things being equal, we will continue to informally encourage organizational diversity by having core reviewers from different organizations. Core reviewers have the professional responsibility of avoiding conflicts of interest. > > Jay > > On 5/4/2019 8:19 PM, Morgan Fainberg wrote: > > > > On Sat, May 4, 2019, 16:48 Eric Fried wrote: >> >> (NB: I tagged [all] because it would be interesting to know where other >> teams stand on this issue.) >> >> Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance >> >> Summary: >> - There is a (currently unwritten? at least for Nova) rule that a patch >> should not be approved exclusively by cores from the same company. This >> is rife with nuance, including but not limited to: >> - Usually (but not always) relevant when the patch was proposed by >> member of same company >> - N/A for trivial things like typo fixes >> - The issue is: >> - Should the rule be abolished? and/or >> - Should the rule be written down? >> >> Consensus (not unanimous): >> - The rule should not be abolished. There are cases where both the >> impetus and the subject matter expertise for a patch all reside within >> one company. In such cases, at least one core from another company >> should still be engaged and provide a "procedural +2" - much like cores >> proxy SME +1s when there's no core with deep expertise. >> - If there is reasonable justification for bending the rules (e.g. typo >> fixes as noted above, some piece of work clearly not related to the >> company's interest, unwedging the gate, etc.) said justification should >> be clearly documented in review commentary. >> - The rule should not be documented (this email notwithstanding). This >> would either encourage loopholing or turn into a huge detailed legal >> tome that nobody will read. It would also *require* enforcement, which >> is difficult and awkward. Overall, we should be able to trust cores to >> act in good faith and in the appropriate spirit. >> >> efried >> . > > > Keystone used to have the same policy outlined in this email (with much of the same nuance and exceptions). Without going into crazy details (as the contributor and core numbers went down), we opted to really lean on "Overall, we should be able to trust cores to act in good faith". We abolished the rule and the cores always ask for outside input when the familiarity lies outside of the team. We often also pull in cores more familiar with the code sometimes ending up with 3x+2s before we workflow the patch. > > Personally I don't like the "this is an unwritten rule and it shouldn't be documented"; if documenting and enforcement of the rule elicits worry of gaming the system or being a dense some not read, in my mind (and experience) the rule may not be worth having. I voice my opinion with the caveat that every team is different. If the rule works, and helps the team (Nova in this case) feel more confident in the management of code, the rule has a place to live on. What works for one team doesn't always work for another. From rafaelweingartner at gmail.com Thu May 9 00:45:38 2019 From: rafaelweingartner at gmail.com (=?UTF-8?Q?Rafael_Weing=C3=A4rtner?=) Date: Wed, 8 May 2019 21:45:38 -0300 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> Message-ID: > > Unfortunately, I have a conflict at that time and will not be able to > attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when Events > were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the engineers > who had worked on them > > had moved on to other things, so the features had become unsupported. In > late 2018 there was > > an effort to clean up things that were not well maintained or didn't fit > the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > Thanks for the reply Joseph, I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well. Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization. So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community. Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us). Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish. Events is one feature that often gets requested, but the use cases and > demand for it are not expressed > > strongly or well understood by most people. If the Telemetry project has > demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the requirements > for event handling and > > possibly choosing a champion for maintaining the Panko service. > > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train. Contributions and comments welcome. :) > > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too? On Wed, May 8, 2019 at 6:23 PM Joseph Davis wrote: > On 5/8/19 7:12 AM, openstack-discuss-request at lists.openstack.org wrote: > > Hello Trinh, > Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in > the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I > would like to discuss and understand a bit better the context behind > the Telemetry > events deprecation. > > Unfortunately, I have a conflict at that time and will not be able to > attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when Events > were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the engineers > who had worked on them > > had moved on to other things, so the features had become unsupported. In > late 2018 there was > > an effort to clean up things that were not well maintained or didn't fit > the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > > > Events is one feature that often gets requested, but the use cases and > demand for it are not expressed > > strongly or well understood by most people. If the Telemetry project has > demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the requirements > for event handling and > > possibly choosing a champion for maintaining the Panko service. > > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train. Contributions and comments welcome. :) > > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > > On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: > > > Hi team, > > As planned, we will have a team meeting at 02:00 UTC, May 9th on > #openstack-telemetry to discuss what we gonna do for the next milestone > (Train-1) and continue what we left off from the last meeting. > > I put here [1] the agenda thinking that it should be fine for an hour > meeting. If you have anything to talk about, please put it there too. > > [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda > > > Bests, > > --**Trinh Nguyen** > *www.edlab.xyz * > > > > -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From joseph.davis at suse.com Thu May 9 01:33:11 2019 From: joseph.davis at suse.com (Joseph Davis) Date: Wed, 8 May 2019 18:33:11 -0700 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> Message-ID: <51d1e4cd-3e88-8326-a28e-56e267637d83@suse.com> On 5/8/19 5:45 PM, Rafael Weingärtner wrote: > Thanks for the reply Joseph, > > I have seen the commit message, and I also read the blog you > referenced (and other pages related to the same topic) which made us a > bit worried. I will try to explain our perspective and impressions > when we read those blog pages. It is also worth noting that we have > just started engaging with the OpenStack community (so, pardon my > ignorance with some parts of OpenStack, and how this OpenSource > community works). We are already making some contributions to > Kolla-ansible, and we want to start to contribute back to Telemetry as > well. > > Before getting to the topic of Telemetry, and to be more precise, > Ceilometer, let me state that I have taken part in other OpenSource > projects and communities before, but these communities are managed by > a different organization. > > So, Ceilometer; when we were designing and building our OpenStack > Cloud, where billing is a crucial part of it. Ceilometer was chosen > because it fits our requirements, working "out of the box" to provide > valuable data for billing in a high availability fashion. It for sure > lacks some features, but that is ok when one works with OpenSource. > The pollers and event managers we are missing, we would like to create > and contribute back to the community. > > Having said that, what puzzled me, and worried us, is the fact that > features that work are being removed from a project just because some > contributors/committers left the community. There wasn't (at least I > did not see) a good technical reason to remove this feature (e.g. it > does not deliver what is promised, or an alternative solution has been > created somewhere and effort is being concentrated there, nobody uses > it, and so on). If the features were broken, and there were no people > to fix it, I would understand, but that is not the case. The feature > works, and it delivers what is promised. Moreover, reading the blog > you referenced does not provide a good feeling about how the community > has managed the event (the project losing part of its contributors) in > question. OpenSource has cycles, and it is understandable that > sometimes we do not have many people working on something. OpenSource > projects have cycles, and that is normal. As you can see, now there > would be us starting/trying to engage with the Telemetry > project/community. What is hard for us to understand is that the > contributors while leaving are also "killing" the project by removing > part of its features (that are very interesting and valuable for us). > Yeah, the history of Telemetry is a bit unusual in how it developed, and I could give editorials and opinions about decisions that were made and how well they worked in the community, but I'll save that for another time.  I will say that communication with the community could have been better.  And while I think that simplifying Ceilometer was a good choice at the time when the number of contributors was dwindling, I agree that cutting out a feature that is being used by users is not how OpenStack ought to operate. And now I'm starting to give opinions so I'll stop. I will say that it may be beneficial to the Telemetry project if you can write out your use case for the Telemetry stack and describe why you want Events to be captured and how you will use them.  Describe how they important to your billing solution (*), and if you are matching the event notifications up with other metering data.  You can discuss with the team in the meeting if that use case and set of requirements goes in Storyboard or elsewhere. (*) I am curious if you are using CloudKitty or another solution. > Why is that important for us? > When we work with OpenSource we now that we might need to put effort > to customize/adapt things to our business workflow, and we expect that > the community will be there to receive and discuss these changes. > Therefore, we have predictability that the software/system we base our > business will be there, and we can contribute back to improve it. An > open source community could and should live even if the project has no > community for a while, then if people regroup and start to work on it > again, the community is able to flourish. I'm really glad you recognize the benefits of contributing back to the community.  It gives me hope. :) > > It is awesome that you might have a similar spec (not developed yet) > for Monasca, but the question would remain for us. One, two, or three > years from now, what will happen if you, your team, or the people > behind this spec/feature decide to leave the community? Will this > feature be removed from Monasca too? Developers leaving the community is a normal part of the lifecycle, so I think you would agree that part of having a healthy project is ensuring that when that happens the project can go on.  Monasca has already seen a number of developers come and go, and will continue on for the foreseeable future.  That is part of why we wrote a spec for the events-listener, so that if needed the work could change hands and continue with context.  We try to plan and get cross-company agreement in the community.  Of course, there are priorities and trade-offs and limits on developers, but Monasca and OpenStack seem to do a good job of being 'open' about it. > > -- > Rafael Weingärtner joseph -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Thu May 9 01:45:12 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 9 May 2019 10:45:12 +0900 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <51d1e4cd-3e88-8326-a28e-56e267637d83@suse.com> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <51d1e4cd-3e88-8326-a28e-56e267637d83@suse.com> Message-ID: Thanks, Joseph, Rafael for the great comments. Understanding the user's use-cases is a very important step to make a feature alive. On Thu, May 9, 2019 at 10:33 AM Joseph Davis wrote: > On 5/8/19 5:45 PM, Rafael Weingärtner wrote: > > > Thanks for the reply Joseph, > > I have seen the commit message, and I also read the blog you referenced > (and other pages related to the same topic) which made us a bit worried. I > will try to explain our perspective and impressions when we read those blog > pages. It is also worth noting that we have just started engaging with the > OpenStack community (so, pardon my ignorance with some parts of OpenStack, > and how this OpenSource community works). We are already making some > contributions to Kolla-ansible, and we want to start to contribute back to > Telemetry as well. > > Before getting to the topic of Telemetry, and to be more precise, > Ceilometer, let me state that I have taken part in other OpenSource > projects and communities before, but these communities are managed by a > different organization. > > So, Ceilometer; when we were designing and building our OpenStack Cloud, > where billing is a crucial part of it. Ceilometer was chosen because it > fits our requirements, working "out of the box" to provide valuable data > for billing in a high availability fashion. It for sure lacks some > features, but that is ok when one works with OpenSource. The pollers and > event managers we are missing, we would like to create and contribute back > to the community. > > Having said that, what puzzled me, and worried us, is the fact that > features that work are being removed from a project just because some > contributors/committers left the community. There wasn't (at least I did > not see) a good technical reason to remove this feature (e.g. it does not > deliver what is promised, or an alternative solution has been created > somewhere and effort is being concentrated there, nobody uses it, and so > on). If the features were broken, and there were no people to fix it, I > would understand, but that is not the case. The feature works, and it > delivers what is promised. Moreover, reading the blog you referenced does > not provide a good feeling about how the community has managed the event > (the project losing part of its contributors) in question. OpenSource has > cycles, and it is understandable that sometimes we do not have many people > working on something. OpenSource projects have cycles, and that is normal. > As you can see, now there would be us starting/trying to engage with the > Telemetry project/community. What is hard for us to understand is that the > contributors while leaving are also "killing" the project by removing part > of its features (that are very interesting and valuable for us). > > Yeah, the history of Telemetry is a bit unusual in how it developed, and I > could give editorials and opinions about decisions that were made and how > well they worked in the community, but I'll save that for another time. I > will say that communication with the community could have been better. And > while I think that simplifying Ceilometer was a good choice at the time > when the number of contributors was dwindling, I agree that cutting out a > feature that is being used by users is not how OpenStack ought to operate. > And now I'm starting to give opinions so I'll stop. > > I will say that it may be beneficial to the Telemetry project if you can > write out your use case for the Telemetry stack and describe why you want > Events to be captured and how you will use them. Describe how they > important to your billing solution (*), and if you are matching the event > notifications up with other metering data. You can discuss with the team > in the meeting if that use case and set of requirements goes in Storyboard > or elsewhere. > > (*) I am curious if you are using CloudKitty or another solution. > > > Why is that important for us? > When we work with OpenSource we now that we might need to put effort to > customize/adapt things to our business workflow, and we expect that the > community will be there to receive and discuss these changes. Therefore, we > have predictability that the software/system we base our business will be > there, and we can contribute back to improve it. An open source community > could and should live even if the project has no community for a while, > then if people regroup and start to work on it again, the community is able > to flourish. > > I'm really glad you recognize the benefits of contributing back to the > community. It gives me hope. :) > > > > It is awesome that you might have a similar spec (not developed yet) for > Monasca, but the question would remain for us. One, two, or three years > from now, what will happen if you, your team, or the people behind this > spec/feature decide to leave the community? Will this feature be removed > from Monasca too? > > Developers leaving the community is a normal part of the lifecycle, so I > think you would agree that part of having a healthy project is ensuring > that when that happens the project can go on. Monasca has already seen a > number of developers come and go, and will continue on for the foreseeable > future. That is part of why we wrote a spec for the events-listener, so > that if needed the work could change hands and continue with context. We > try to plan and get cross-company agreement in the community. Of course, > there are priorities and trade-offs and limits on developers, but Monasca > and OpenStack seem to do a good job of being 'open' about it. > > > > > -- > Rafael Weingärtner > > > joseph > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Thu May 9 01:48:29 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 9 May 2019 10:48:29 +0900 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: Message-ID: Hi team, It's 12m before the meeting. Bests, On Thu, May 9, 2019 at 12:09 AM Rafael Weingärtner < rafaelweingartner at gmail.com> wrote: > Thanks, I'll be there. > > Em qua, 8 de mai de 2019 11:41, Trinh Nguyen > escreveu: > >> Hi Rafael, >> >> The meeting will be held on the IRC channel #openstack-telemetry as >> mentioned in the previous email. >> >> Thanks, >> >> On Wed, May 8, 2019 at 10:50 PM Rafael Weingärtner < >> rafaelweingartner at gmail.com> wrote: >> >>> Hello Trinh, >>> Where does the meeting happen? Will it be via IRC Telemetry channel? Or, >>> in the Etherpad ( >>> https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would >>> like to discuss and understand a bit better the context behind the Telemetry >>> events deprecation. >>> >>> On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen >>> wrote: >>> >>>> Hi team, >>>> >>>> As planned, we will have a team meeting at 02:00 UTC, May 9th on >>>> #openstack-telemetry to discuss what we gonna do for the next milestone >>>> (Train-1) and continue what we left off from the last meeting. >>>> >>>> I put here [1] the agenda thinking that it should be fine for an hour >>>> meeting. If you have anything to talk about, please put it there too. >>>> >>>> [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda >>>> >>>> >>>> Bests, >>>> >>>> -- >>>> *Trinh Nguyen* >>>> *www.edlab.xyz * >>>> >>>> >>> >>> -- >>> Rafael Weingärtner >>> >> >> >> -- >> *Trinh Nguyen* >> *www.edlab.xyz * >> >> -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Thu May 9 02:00:45 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 8 May 2019 20:00:45 -0600 Subject: [tripleo] CI RED fyi.. something is causing both overcloud network configuration issues atm Message-ID: Seem the jobs go red at midnight May 9th UTC. What I'm mainly seeing is the overcloud hang after the ssh keys are created, it seems the overcloud nodes do not have network connectivity. http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/subnode-2/var/log/extra/failed_services.txt.gz This looks normal eth0, eth1 come up http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/subnode-2/var/log/journal.txt.gz#_May_08_21_42_19 I'm not 100% if this related to some of the latest patches, or if this impacts all jobs atm. Looking into it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From whayutin at redhat.com Thu May 9 04:22:25 2019 From: whayutin at redhat.com (Wesley Hayutin) Date: Wed, 8 May 2019 22:22:25 -0600 Subject: [tripleo] CI RED fyi.. something is causing both overcloud network configuration issues atm References: Message-ID: On Wed, May 8, 2019 at 8:00 PM Wesley Hayutin wrote: > Seem the jobs go red at midnight May 9th UTC. > > What I'm mainly seeing is the overcloud hang after the ssh keys are > created, it seems the overcloud nodes do not have network connectivity. > > http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz > > > http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/subnode-2/var/log/extra/failed_services.txt.gz > > This looks normal eth0, eth1 come up > > http://logs.openstack.org/47/657547/5/check/tripleo-ci-centos-7-containers-multinode/af82c9f/logs/subnode-2/var/log/journal.txt.gz#_May_08_21_42_19 > > I'm not 100% if this related to some of the latest patches, or if this > impacts all jobs atm. > Looking into it. > AFAICT, the issue is either with a few of the patches submitted or a blip in infra. A clean test patch is working w/ multinode-containers and ovb fs001 CI should be green'ish quoting #tripleo Sorry for the spam -------------- next part -------------- An HTML attachment was scrubbed... URL: From sundar.nadathur at intel.com Thu May 9 04:37:08 2019 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Wed, 8 May 2019 21:37:08 -0700 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <1934f31d-da89-071f-d667-c36d965851ae@fried.cc> References: <97bd8e53-0285-1c92-845f-21098b0b0e38@gmail.com> <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> <1CC272501B5BC543A05DB90AA509DED527557F03@fmsmsx122.amr.corp.intel.com> <1934f31d-da89-071f-d667-c36d965851ae@fried.cc> Message-ID: <489d8cae-9151-5f43-b495-ad51c959a0ea@intel.com> On 5/8/2019 2:31 PM, Eric Fried wrote: > Sundar- > >> I have a set of compute hosts, each with several NICs of type T. Each NIC has a set of PFs: PF1, PF2, .... Each PF is a resource provider, and each has a separate custom RC: CUSTOM_RC_PF1, CUSTOM_RC_PF2, ... . The VFs are inventories of the associated PF's RC. Provider networks etc. are traits on that PF. > It would be weird for the inventories to be called PF* if they're > inventories of VF.  I am focusing mainly on the concepts for now, not on the names. > But mainly: why the custom resource classes? This is as elaborate an example as I could cook up. IRL, we may need some custom RC, but maybe not one for each PF type. > The way "resourceless RP" + "same_subtree" is designed to work is best > explained if I model your use case with standard resource classes instead: > > CN > | > +---NIC1 (trait: I_AM_A_NIC) > | | > | +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) > | | > | +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) > | > +---NIC2 (trait: I_AM_A_NIC) > | > +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) > | > +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) > > Now if I say: > > ?resources_T1=VF:1 > &required_T1=CUSTOM_PHYSNET1 > &resources_T2=VF:1 > &required_T2=CUSTOM_PHYSNET2 > &required_T3=I_AM_A_NIC > &same_subtree=','.join([suffix for suffix in suffixes if > suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') > > ...then I'll get two candidates: > > - {PF1_1: VF=1, PF1_2: VF=1} <== i.e. both from NIC1 > - {PF2_1: VF=1, PF2_2: VF=1} <== i.e. both from NIC2 > > ...and no candidates where one VF is from each NIC. > > IIUC this is how you wanted it. Yes. The examples in the storyboard [1] for NUMA affinity use group numbers. If that were recast to use named groups, and we wanted NUMA affinity apart from device colocation, would that not require a different name than T? In short, if you want to express 2 different affinities/groupings, perhaps we need to use a name with 2 parts, and use 2 different same_subtree clauses. Just pointing out the implications. BTW, I noticed there is a standard RC for NIC VFs [2]. [1] https://storyboard.openstack.org/#!/story/2005575 [2] https://github.com/openstack/os-resource-classes/blob/master/os_resource_classes/__init__.py#L49 > ============== > > With the custom resource classes, I'm having a hard time understanding > the model. How unique are the _PF$N bits? Do they repeat (a) from one > NIC to the next? (b) From one host to the next? (c) Never? > > The only thing that begins to make sense is (a), because (b) and (c) > would lead to skittles. So assuming (a), the model would look something > like: Yes, (a) is what I had in mind. > CN > | > +---NIC1 (trait: I_AM_A_NIC) > | | > | +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4) > | | > | +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4) > | > +---NIC2 (trait: I_AM_A_NIC) > | > +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: CUSTOM_PF1_VF=4) > | > +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: CUSTOM_PF2_VF=4) > > Now you could get the same result with (essentially) the same request as > above: > > ?resources_T1=CUSTOM_PF1_VF:1 > &required_T1=CUSTOM_PHYSNET1 > &resources_T2=CUSTOM_PF2_VF:1 > &required_T2=CUSTOM_PHYSNET2 > &required_T3=I_AM_A_NIC > &same_subtree=','.join([suffix for suffix in suffixes if > suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') > > ==> > > - {PF1_1: CUSTOM_PF1_VF=1, PF1_2: CUSTOM_PF2_VF=1} > - {PF2_1: CUSTOM_PF1_VF=1, PF2_2: CUSTOM_PF2_VF=1} > > ...except that in this model, PF$N corresponds to PHYSNET$N, so you > wouldn't actually need the required_T$N=CUSTOM_PHYSNET$N to get the same > result: > > ?resources_T1=CUSTOM_PF1_VF:1 > &resources_T2=CUSTOM_PF2_VF:1 > &required_T3=I_AM_A_NIC > &same_subtree=','.join([suffix for suffix in suffixes if > suffix.startswith('_T')]) (i.e. '_T1,_T2,_T3') > > ...because you're effectively encoding the physnet into the RC. Which is > not good IMO. > > But either way... > >> Do I have to create a 'resourceless RP' for the NIC card that contains > the individual PF RPs as children nodes? > > ...if you want to be able to request this kind of affinity, then yes, > you do (unless there's some consumable resource on the NIC, in which > case it's not resourceless, but the spirit is the same). This is exactly > what these features are being designed for. Great. Thank you very much for the detailed reply. Regards, Sundar > Thanks, > efried > . > From dangtrinhnt at gmail.com Thu May 9 06:37:00 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 9 May 2019 15:37:00 +0900 Subject: [tc][searchlight] What does Maintenance Mode mean for a project? Message-ID: Hi, Currently, in the project details section of Searchlight page [1], it says we're in the Maintenance Mode. What does that mean? and how we can update it? Thanks, [1] https://www.openstack.org/software/releases/rocky/components/searchlight -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From Tim.Bell at cern.ch Thu May 9 07:24:43 2019 From: Tim.Bell at cern.ch (Tim Bell) Date: Thu, 9 May 2019 07:24:43 +0000 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> Message-ID: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Is it time to rethink the approach to telemetry a bit? Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html or using a framework like Prometheus)? In the end, the projects are the ones who have the best knowledge of how to get the metrics. Tim From: Rafael Weingärtner Date: Thursday, 9 May 2019 at 02:51 To: Joseph Davis Cc: openstack-discuss , Trinh Nguyen Subject: Re: [telemetry] Team meeting agenda for tomorrow Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported. In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Thanks for the reply Joseph, I have seen the commit message, and I also read the blog you referenced (and other pages related to the same topic) which made us a bit worried. I will try to explain our perspective and impressions when we read those blog pages. It is also worth noting that we have just started engaging with the OpenStack community (so, pardon my ignorance with some parts of OpenStack, and how this OpenSource community works). We are already making some contributions to Kolla-ansible, and we want to start to contribute back to Telemetry as well. Before getting to the topic of Telemetry, and to be more precise, Ceilometer, let me state that I have taken part in other OpenSource projects and communities before, but these communities are managed by a different organization. So, Ceilometer; when we were designing and building our OpenStack Cloud, where billing is a crucial part of it. Ceilometer was chosen because it fits our requirements, working "out of the box" to provide valuable data for billing in a high availability fashion. It for sure lacks some features, but that is ok when one works with OpenSource. The pollers and event managers we are missing, we would like to create and contribute back to the community. Having said that, what puzzled me, and worried us, is the fact that features that work are being removed from a project just because some contributors/committers left the community. There wasn't (at least I did not see) a good technical reason to remove this feature (e.g. it does not deliver what is promised, or an alternative solution has been created somewhere and effort is being concentrated there, nobody uses it, and so on). If the features were broken, and there were no people to fix it, I would understand, but that is not the case. The feature works, and it delivers what is promised. Moreover, reading the blog you referenced does not provide a good feeling about how the community has managed the event (the project losing part of its contributors) in question. OpenSource has cycles, and it is understandable that sometimes we do not have many people working on something. OpenSource projects have cycles, and that is normal. As you can see, now there would be us starting/trying to engage with the Telemetry project/community. What is hard for us to understand is that the contributors while leaving are also "killing" the project by removing part of its features (that are very interesting and valuable for us). Why is that important for us? When we work with OpenSource we now that we might need to put effort to customize/adapt things to our business workflow, and we expect that the community will be there to receive and discuss these changes. Therefore, we have predictability that the software/system we base our business will be there, and we can contribute back to improve it. An open source community could and should live even if the project has no community for a while, then if people regroup and start to work on it again, the community is able to flourish. Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people. If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train. Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst It is awesome that you might have a similar spec (not developed yet) for Monasca, but the question would remain for us. One, two, or three years from now, what will happen if you, your team, or the people behind this spec/feature decide to leave the community? Will this feature be removed from Monasca too? On Wed, May 8, 2019 at 6:23 PM Joseph Davis > wrote: On 5/8/19 7:12 AM, openstack-discuss-request at lists.openstack.org wrote: Hello Trinh, Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I would like to discuss and understand a bit better the context behind the Telemetry events deprecation. Unfortunately, I have a conflict at that time and will not be able to attend. I do have a little bit of context on the Events deprecation to share. First, you will note the commit message from the commit [0] when Events were deprecated: " Deprecate event subsystem This subsystem has never been finished and is not maintained. Deprecate it for future removal. " I got the impression from jd at the time that there were a number of features in Telemetry, including Panko, that were not really "finished" and that the engineers who had worked on them had moved on to other things, so the features had become unsupported. In late 2018 there was an effort to clean up things that were not well maintained or didn't fit the direction of Telemetry. See also: https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ Events is one feature that often gets requested, but the use cases and demand for it are not expressed strongly or well understood by most people. If the Telemetry project has demand to de-deprecate Event handling (including Panko), I'd suggest a review of the requirements for event handling and possibly choosing a champion for maintaining the Panko service. Also note: over in Monasca we have a spec [1] for handling Events ingestion which I hope we will be completing in Train. Contributions and comments welcome. :) joseph [0] https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 [1] https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: Hi team, As planned, we will have a team meeting at 02:00 UTC, May 9th on #openstack-telemetry to discuss what we gonna do for the next milestone (Train-1) and continue what we left off from the last meeting. I put here [1] the agenda thinking that it should be fine for an hour meeting. If you have anything to talk about, please put it there too. [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda Bests, -- *Trinh Nguyen* *www.edlab.xyz * -- Rafael Weingärtner -------------- next part -------------- An HTML attachment was scrubbed... URL: From dangtrinhnt at gmail.com Thu May 9 07:37:35 2019 From: dangtrinhnt at gmail.com (Trinh Nguyen) Date: Thu, 9 May 2019 16:37:35 +0900 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Message-ID: Hi Tim, It's exactly a great time for your idea as we are trying to develop the new roadmap/vision for Telemetry. I put your comment to the brainstorming etherpad [1] [1] https://etherpad.openstack.org/p/telemetry-train-roadmap Bests, On Thu, May 9, 2019 at 4:24 PM Tim Bell wrote: > Is it time to rethink the approach to telemetry a bit? > > > > Having each project provide its telemetry data (such as Swift with statsd > - > https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html > > or using a framework like Prometheus)? > > > > In the end, the projects are the ones who have the best knowledge of how > to get the metrics. > > > > Tim > > > > *From: *Rafael Weingärtner > *Date: *Thursday, 9 May 2019 at 02:51 > *To: *Joseph Davis > *Cc: *openstack-discuss , Trinh > Nguyen > *Subject: *Re: [telemetry] Team meeting agenda for tomorrow > > > > Unfortunately, I have a conflict at that time and will not be able to > attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when Events > were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the engineers > who had worked on them > > had moved on to other things, so the features had become unsupported. In > late 2018 there was > > an effort to clean up things that were not well maintained or didn't fit > the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > > > > Thanks for the reply Joseph, > > I have seen the commit message, and I also read the blog you referenced > (and other pages related to the same topic) which made us a bit worried. I > will try to explain our perspective and impressions when we read those blog > pages. It is also worth noting that we have just started engaging with the > OpenStack community (so, pardon my ignorance with some parts of OpenStack, > and how this OpenSource community works). We are already making some > contributions to Kolla-ansible, and we want to start to contribute back to > Telemetry as well. > > Before getting to the topic of Telemetry, and to be more precise, > Ceilometer, let me state that I have taken part in other OpenSource > projects and communities before, but these communities are managed by a > different organization. > > So, Ceilometer; when we were designing and building our OpenStack Cloud, > where billing is a crucial part of it. Ceilometer was chosen because it > fits our requirements, working "out of the box" to provide valuable data > for billing in a high availability fashion. It for sure lacks some > features, but that is ok when one works with OpenSource. The pollers and > event managers we are missing, we would like to create and contribute back > to the community. > > Having said that, what puzzled me, and worried us, is the fact that > features that work are being removed from a project just because some > contributors/committers left the community. There wasn't (at least I did > not see) a good technical reason to remove this feature (e.g. it does not > deliver what is promised, or an alternative solution has been created > somewhere and effort is being concentrated there, nobody uses it, and so > on). If the features were broken, and there were no people to fix it, I > would understand, but that is not the case. The feature works, and it > delivers what is promised. Moreover, reading the blog you referenced does > not provide a good feeling about how the community has managed the event > (the project losing part of its contributors) in question. OpenSource has > cycles, and it is understandable that sometimes we do not have many people > working on something. OpenSource projects have cycles, and that is normal. > As you can see, now there would be us starting/trying to engage with the > Telemetry project/community. What is hard for us to understand is that the > contributors while leaving are also "killing" the project by removing part > of its features (that are very interesting and valuable for us). > > Why is that important for us? > When we work with OpenSource we now that we might need to put effort to > customize/adapt things to our business workflow, and we expect that the > community will be there to receive and discuss these changes. Therefore, we > have predictability that the software/system we base our business will be > there, and we can contribute back to improve it. An open source community > could and should live even if the project has no community for a while, > then if people regroup and start to work on it again, the community is able > to flourish. > > > > Events is one feature that often gets requested, but the use cases and > demand for it are not expressed > > strongly or well understood by most people. If the Telemetry project has > demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the requirements > for event handling and > > possibly choosing a champion for maintaining the Panko service. > > > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train. Contributions and comments welcome. :) > > > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > > > It is awesome that you might have a similar spec (not developed yet) for > Monasca, but the question would remain for us. One, two, or three years > from now, what will happen if you, your team, or the people behind this > spec/feature decide to leave the community? Will this feature be removed > from Monasca too? > > > > On Wed, May 8, 2019 at 6:23 PM Joseph Davis wrote: > > On 5/8/19 7:12 AM, openstack-discuss-request at lists.openstack.org wrote: > > Hello Trinh, > > Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in > > the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I > > would like to discuss and understand a bit better the context behind > > the Telemetry > > events deprecation. > > Unfortunately, I have a conflict at that time and will not be able to > attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when Events > were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the engineers > who had worked on them > > had moved on to other things, so the features had become unsupported. In > late 2018 there was > > an effort to clean up things that were not well maintained or didn't fit > the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > > > > Events is one feature that often gets requested, but the use cases and > demand for it are not expressed > > strongly or well understood by most people. If the Telemetry project has > demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the requirements > for event handling and > > possibly choosing a champion for maintaining the Panko service. > > > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train. Contributions and comments welcome. :) > > > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > > > On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: > > > > Hi team, > > > > As planned, we will have a team meeting at 02:00 UTC, May 9th on > > #openstack-telemetry to discuss what we gonna do for the next milestone > > (Train-1) and continue what we left off from the last meeting. > > > > I put here [1] the agenda thinking that it should be fine for an hour > > meeting. If you have anything to talk about, please put it there too. > > > > [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda > > > > > > Bests, > > > > -- > > ****Trinh Nguyen** > > *www.edlab.xyz * > > > > > > > > -- > > Rafael Weingärtner > -- *Trinh Nguyen* *www.edlab.xyz * -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekuvaja at redhat.com Thu May 9 07:53:51 2019 From: ekuvaja at redhat.com (Erno Kuvaja) Date: Thu, 9 May 2019 08:53:51 +0100 Subject: [Glance] No team meeting today Message-ID: Hi all, There is no agenda items proposed for todays meeting and I'm still traveling after the Summit/PTG so we will not have weekly meeting today. Lets resume to the normal from next week onwards. Thanks all! Best, Erno "jokke_" Kuvaja From mrunge at matthias-runge.de Thu May 9 08:23:58 2019 From: mrunge at matthias-runge.de (Matthias Runge) Date: Thu, 9 May 2019 10:23:58 +0200 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> Message-ID: <20190509082357.GA3547@hilbert.berg.ol> On Wed, May 08, 2019 at 09:45:38PM -0300, Rafael Weingärtner wrote: > Having said that, what puzzled me, and worried us, is the fact that > features that work are being removed from a project just because some > contributors/committers left the community. There wasn't (at least I did > not see) a good technical reason to remove this feature (e.g. it does not If I remember correctly, it was the other way around. The idea was to make things cleaner: ceilometer to just gather data and to send it along, gnocchi for storage, panko for events, etc. > deliver what is promised, or an alternative solution has been created > somewhere and effort is being concentrated there, nobody uses it, and so > on). If the features were broken, and there were no people to fix it, I > would understand, but that is not the case. The feature works, and it > delivers what is promised. Moreover, reading the blog you referenced does > not provide a good feeling about how the community has managed the event > (the project losing part of its contributors) in question. OpenSource has > cycles, and it is understandable that sometimes we do not have many people > working on something. OpenSource projects have cycles, and that is normal. > As you can see, now there would be us starting/trying to engage with the > Telemetry project/community. What is hard for us to understand is that the > contributors while leaving are also "killing" the project by removing part > of its features (that are very interesting and valuable for us). So, let's take your understanding what/how OpenSource works aside, please. I am sure, nobody is trying to kill their baby when leaving a project. > > Why is that important for us? > When we work with OpenSource we now that we might need to put effort to > customize/adapt things to our business workflow, and we expect that the > community will be there to receive and discuss these changes. Therefore, we > have predictability that the software/system we base our business will be > there, and we can contribute back to improve it. An open source community > could and should live even if the project has no community for a while, > then if people regroup and start to work on it again, the community is able > to flourish. Right. We're at the point "after no community", and it is up to the community to start something new, taking over the corresponding code (if they choose to do so). Matthias -- Matthias Runge From mrunge at matthias-runge.de Thu May 9 08:35:58 2019 From: mrunge at matthias-runge.de (Matthias Runge) Date: Thu, 9 May 2019 10:35:58 +0200 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Message-ID: <20190509083558.GB3547@hilbert.berg.ol> On Thu, May 09, 2019 at 07:24:43AM +0000, Tim Bell wrote: > Is it time to rethink the approach to telemetry a bit? > > Having each project provide its telemetry data (such as Swift with statsd - https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html > or using a framework like Prometheus)? > > In the end, the projects are the ones who have the best knowledge of how to get the metrics. > > Tim Yes please! I'd have some ideas, here. Prometheus has been mentioned so many times now as a requirement/request. There are also other projects to mention here, such as collectd, or OPNFV Barometer. Unfortunately, having a meetig at 4 am in the morning does not really work for me. May I kindly request to move the meeting to a more friendly hour? -- Matthias Runge From witold.bedyk at suse.com Thu May 9 08:42:39 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 9 May 2019 10:42:39 +0200 Subject: [telemetry][monasca][self-healing] Team meeting agenda for tomorrow In-Reply-To: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Message-ID: <07e6e24d-4b80-8a97-077e-e6e9b39ba15e@suse.com> Agree. Instrumenting the code is the most efficient and recommended way to monitor the applications. We have discussed it during the Self-healing SIG PTG session last week. The problem is that telemetry topic is not and never will be high priority for individual projects so the coordination effort from community is required here. I thinks this is one of the areas where Telemetry and Monasca teams could work together on. Cheers Witek On 5/9/19 9:24 AM, Tim Bell wrote: > Is it time to rethink the approach to telemetry a bit? > > Having each project provide its telemetry data (such as Swift with > statsd - > https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html > > or using a framework like Prometheus)? > > In the end, the projects are the ones who have the best knowledge of how > to get the metrics. > > Tim > > *From: *Rafael Weingärtner > *Date: *Thursday, 9 May 2019 at 02:51 > *To: *Joseph Davis > *Cc: *openstack-discuss , Trinh > Nguyen > *Subject: *Re: [telemetry] Team meeting agenda for tomorrow > > Unfortunately, I have a conflict at that time and will not be able > to attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when > Events were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the > engineers who had worked on them > > had moved on to other things, so the features had become > unsupported.  In late 2018 there was > > an effort to clean up things that were not well maintained or didn't > fit the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > > Thanks for the reply Joseph, > > I have seen the commit message, and I also read the blog you referenced > (and other pages related to the same topic) which made us a bit worried. > I will try to explain our perspective and impressions when we read those > blog pages. It is also worth noting that we have just started engaging > with the OpenStack community (so, pardon my ignorance with some parts of > OpenStack, and how this OpenSource community works). We are already > making some contributions to Kolla-ansible, and we want to start to > contribute back to Telemetry as well. > > Before getting to the topic of Telemetry, and to be more precise, > Ceilometer, let me state that I have taken part in other OpenSource > projects and communities before, but these communities are managed by a > different organization. > > So, Ceilometer; when we were designing and building our OpenStack Cloud, > where billing is a crucial part of it. Ceilometer was chosen because it > fits our requirements, working "out of the box" to provide valuable data > for billing in a high availability fashion. It for sure lacks some > features, but that is ok when one works with OpenSource. The pollers and > event managers we are missing, we would like to create and contribute > back to the community. > > Having said that, what puzzled me, and worried us, is the fact that > features that work are being removed from a project just because some > contributors/committers left the community. There wasn't (at least I did > not see) a good technical reason to remove this feature (e.g. it does > not deliver what is promised, or an alternative solution has been > created somewhere and effort is being concentrated there, nobody uses > it, and so on). If the features were broken, and there were no people to > fix it, I would understand, but that is not the case. The feature works, > and it delivers what is promised. Moreover, reading the blog you > referenced does not provide a good feeling about how the community has > managed the event (the project losing part of its contributors) in > question. OpenSource has cycles, and it is understandable that sometimes > we do not have many people working on something. OpenSource projects > have cycles, and that is normal. As you can see, now there would be us > starting/trying to engage with the Telemetry project/community. What is > hard for us to understand is that the contributors while leaving are > also "killing" the project by removing part of its features (that are > very interesting and valuable for us). > > Why is that important for us? > When we work with OpenSource we now that we might need to put effort to > customize/adapt things to our business workflow, and we expect that the > community will be there to receive and discuss these changes. Therefore, > we have predictability that the software/system we base our business > will be there, and we can contribute back to improve it. An open source > community could and should live even if the project has no community for > a while, then if people regroup and start to work on it again, the > community is able to flourish. > > Events is one feature that often gets requested, but the use cases > and demand for it are not expressed > > strongly or well understood by most people.  If the Telemetry > project has demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the > requirements for event handling and > > possibly choosing a champion for maintaining the Panko service. > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train.  Contributions and comments welcome. :) > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > It is awesome that you might have a similar spec (not developed yet) for > Monasca, but the question would remain for us. One, two, or three years > from now, what will happen if you, your team, or the people behind this > spec/feature decide to leave the community? Will this feature be removed > from Monasca too? > > On Wed, May 8, 2019 at 6:23 PM Joseph Davis > wrote: > > On 5/8/19 7:12 AM, openstack-discuss-request at lists.openstack.org > wrote: > > Hello Trinh, > > Where does the meeting happen? Will it be via IRC Telemetry channel? Or, in > > the Etherpad (https://etherpad.openstack.org/p/telemetry-meeting-agenda)? I > > would like to discuss and understand a bit better the context behind > > the Telemetry > > events deprecation. > > Unfortunately, I have a conflict at that time and will not be able > to attend. > > I do have a little bit of context on the Events deprecation to share. > > First, you will note the commit message from the commit [0] when > Events were deprecated: > > " > > Deprecate event subsystem > > This subsystem has never been finished and is not maintained. > > Deprecate it for future removal. > > " > > I got the impression from jd at the time that there were a number of > features in Telemetry, > > including Panko, that were not really "finished" and that the > engineers who had worked on them > > had moved on to other things, so the features had become > unsupported.  In late 2018 there was > > an effort to clean up things that were not well maintained or didn't > fit the direction of Telemetry. > > See also: > https://julien.danjou.info/lessons-from-openstack-telemetry-deflation/ > > Events is one feature that often gets requested, but the use cases > and demand for it are not expressed > > strongly or well understood by most people.  If the Telemetry > project has demand to de-deprecate > > Event handling (including Panko), I'd suggest a review of the > requirements for event handling and > > possibly choosing a champion for maintaining the Panko service. > > Also note: over in Monasca we have a spec [1] for handling Events > ingestion which I hope we will be > > completing in Train.  Contributions and comments welcome. :) > > joseph > > [0] > https://github.com/openstack/ceilometer/commit/8a0245a5b3e1357d35ad6653be37ca01176577e4 > > [1] > https://github.com/openstack/monasca-specs/blob/master/specs/stein/approved/monasca-events-listener.rst > > On Wed, May 8, 2019 at 12:19 AM Trinh Nguyen wrote: > > Hi team, > > As planned, we will have a team meeting at 02:00 UTC, May 9th on > > #openstack-telemetry to discuss what we gonna do for the > next milestone > > (Train-1) and continue what we left off from the last meeting. > > I put here [1] the agenda thinking that it should be fine > for an hour > > meeting. If you have anything to talk about, please put it > there too. > > [1] https://etherpad.openstack.org/p/telemetry-meeting-agenda > > Bests, > > -- > > ****Trinh Nguyen** > > *www.edlab.xyz > * > > > > -- > > Rafael Weingärtner > From jose.castro.leon at cern.ch Thu May 9 08:45:52 2019 From: jose.castro.leon at cern.ch (Jose Castro Leon) Date: Thu, 9 May 2019 08:45:52 +0000 Subject: [watcher][qa] Thoughts on performance testing for Watcher In-Reply-To: <201905081419177826734@zte.com.cn> References: 6409b4e4-29af-da6d-1af6-a0d6e753049c@gmail.com <201905081419177826734@zte.com.cn> Message-ID: Hi, Actually, we are working on providing such feature in combination with aardvark. The idea is to create a strategy that fills up with preemptible resources, that later on could be reclaimed by aardvark if a normal instance is deployed. https://www.openstack.org/summit/berlin-2018/summit-schedule/events/22248/towards-fully-automated-cern-private-cloud https://www.openstack.org/summit/denver-2019/summit-schedule/events/23187/improving-resource-availability-in-cern-private-cloud Cheers Jose Castro Leon CERN Cloud Team On Wed, 2019-05-08 at 14:19 +0800, li.canwei2 at zte.com.cn wrote: another note, Watcher provides a WORKLOAD optimization(balancing or consolidation). If you want to maximize the node resource (such as vCPU, Ram...) usage through VM migration, Watcher doesn't have such a strategy now. Thanks! licanwei 原始邮件 发件人:MattRiedemann 收件人:openstack-discuss at lists.openstack.org ; 日 期 :2019年05月08日 04:57 主 题 :[watcher][qa] Thoughts on performance testing for Watcher Hi, I'm new to Watcher and would like to do some performance and scale testing in a simulated environment and wondering if anyone can give some pointers on what I could be testing or looking for. If possible, I'd like to be able to just setup a single-node devstack with the nova fake virt driver which allows me to create dozens of fake compute nodes. I could also create multiple cells with devstack, but there gets to be a limit with how much you can cram into a single node 8GB RAM 8VCPU VM (I could maybe split 20 nodes across 2 cells). I could then create dozens of VMs to fill into those compute nodes. I'm mostly trying to figure out what could be an interesting set of tests. The biggest problem I'm trying to solve with Watcher is optimizing resource utilization, i.e. once the computes hit the Tetris problem and there is some room on some nodes but none of the nodes are fully packed. I was thinking I could simulate this by configuring nova so it spreads rather than packs VMs onto hosts (or just use the chance scheduler which randomly picks a host), using VMs of varying sizes, and then run some audit / action plan (I'm still learning the terminology here) to live migrate the VMs such that they get packed onto as few hosts as possible and see how long that takes. Naturally with devstack using fake nodes and no networking on the VMs, that live migration is basically a noop, but I'm more interested in profiling how long it takes Watcher itself to execute the actions. Once I get to know a bit more about how Watcher works, I could help with optimizing some of the nova-specific stuff using placement [1]. Any advice or guidance here would be appreciated. [1] https://review.opendev.org/#/c/656448/ -- Thanks, Matt -------------- next part -------------- An HTML attachment was scrubbed... URL: From zigo at debian.org Thu May 9 09:02:15 2019 From: zigo at debian.org (Thomas Goirand) Date: Thu, 9 May 2019 11:02:15 +0200 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Message-ID: <1894ef89-ea11-0d31-4820-dc1c39ed07b7@debian.org> On 5/9/19 9:24 AM, Tim Bell wrote: > Is it time to rethink the approach to telemetry a bit? > > Having each project provide its telemetry data (such as Swift with > statsd - > https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html > > or using a framework like Prometheus)? > > In the end, the projects are the ones who have the best knowledge of how > to get the metrics. > > Tim Tim, statsd for swift is for monitoring, it is *not* a usage metric. Likewise with Prometheus, who wont care if some data are missing. I very much would love to have each project handle metrics collection by themselves. Especially, I always though that the polling system implemented in Ceilometer is just wrong, and that every service must be able to report itself rather than being polled. I understand however that doing polling is easier than implementing such change in every service, so I get why it has been done this way. But then we need some kind of timeseries framework within OpenStack as a whole (through an Oslo library?), and also we must decide on a backend. Right now, the only serious thing we have is Gnocchi, since influxdb is gone through the open core model. Or do you have something else to suggest? Cheers, Thomas Goirand (zigo) From balazs.gibizer at ericsson.com Thu May 9 09:19:02 2019 From: balazs.gibizer at ericsson.com (=?iso-8859-1?Q?Bal=E1zs_Gibizer?=) Date: Thu, 9 May 2019 09:19:02 +0000 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> <1556989044.27606.0@smtp.office365.com> Message-ID: <1557393539.17816.4@smtp.office365.com> On Wed, May 8, 2019 at 6:18 PM, Matt Riedemann wrote: > On 5/4/2019 11:57 AM, Balázs Gibizer wrote: >> The failure to detach a port via nova while the nova-compute is down >> could be a bug on nova side. > > Depends on what you mean by detach. If the compute is down while > deleting the server, the API will still call the (internal to nova) > network API code [1] to either (a) unbind ports that nova didn't > create or (2) delete ports that nova did create. This sentence based on the reported bug [2]. The reason while Octavia is unbinding the port in Neutron instead of via Nova is that Nova fails to detach the interface and unbind the port if the nova-compute is down. In that bug we discussing if it would be meaningful to do a local interface detach (unvind port in neutron + deallocate port resource in placement) in the nova-api if the compute is done similar to the local server delete. [2] https://bugs.launchpad.net/nova/+bug/1827746 > > For the policy change where the port has to be unbound to delete it, > we'd already have support for that, it's just an extra step. > > At the PTG I was groaning a bit about needing to add another step to > delete a port from the nova side, but thinking about it more we have > to do the exact same thing with cinder volumes (we have to detach > them before deleting them), so I guess it's not the worst thing ever. As soon as somebody from Neutron states that the neutron policy patch is on the way I can start working on the Nova side of this. Cheers, gibi > > [1] > https://protect2.fireeye.com/url?k=56f34fb5-0a7a9599-56f30f2e-0cc47ad93da2-193a4612d9e0575f&u=https://github.com/openstack/nova/blob/56fef7c0e74d7512f062c4046def10401df16565/nova/compute/api.py#L2291 > > -- > > Thanks, > > Matt > From geguileo at redhat.com Thu May 9 09:28:28 2019 From: geguileo at redhat.com (Gorka Eguileor) Date: Thu, 9 May 2019 11:28:28 +0200 Subject: Baremetal attach volume in Multi-tenancy In-Reply-To: References: Message-ID: <20190509092828.g6qvdg5jbvqqvpba@localhost> On 08/05, zack chen wrote: > Hi, > I am looking for a mechanism that can be used for baremetal attach volume > in a multi-tenant scenario. In addition we use ceph as the backend storage > for cinder. > > Can anybody give me some advice? Hi, Is this a stand alone Cinder deployment or a normal Cinder in OpenStack deployment? What storage backend will you be using? What storage protocol? iSCSI, FC, RBD...? Depending on these you can go with Walter's suggestion of using cinderclient and its extension (which in general is the best way to go), or you may prefer writing a small python script that uses OS-Brick and makes the REST API calls directly. Cheers, Gorka. From sylvain.bauza at gmail.com Thu May 9 09:28:42 2019 From: sylvain.bauza at gmail.com (Sylvain Bauza) Date: Thu, 9 May 2019 11:28:42 +0200 Subject: [nova][CI] GPUs in the gate In-Reply-To: References: <3587e05d-deab-42ad-9a02-4312ca11760f@www.fastmail.com> <20190508132709.xgq6nz3mqkfw3q5d@yuggoth.org> Message-ID: Le mer. 8 mai 2019 à 20:27, Artom Lifshitz a écrit : > On Wed, May 8, 2019 at 9:30 AM Jeremy Stanley wrote: > > Long shot, but since you just need the feature provided and not the > > performance it usually implies, are there maybe any open source > > emulators which provide the same instruction set for conformance > > testing purposes? > > Something like that exists for network cards. It's called netdevsim > [1], and it's been mentioned in the SRIOV live migration spec [2]. > However to my knowledge nothing like that exists for GPUs. > > libvirt provides us a way to fake mediated devices attached to instances but we still need to lookup sysfs for either knowing all the physical GPUs or creating a new mdev so that's where it's not possibleto have an emulator AFAICU. -Sylvain [1] > https://www.phoronix.com/scan.php?page=news_item&px=Linux-4.16-Networking > [2] > https://specs.openstack.org/openstack/nova-specs/specs/train/approved/libvirt-neutron-sriov-livemigration.html#testing > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kchamart at redhat.com Thu May 9 11:55:46 2019 From: kchamart at redhat.com (Kashyap Chamarthy) Date: Thu, 9 May 2019 13:55:46 +0200 Subject: [nova][all][ptg] Summary: Same-Company Approvals In-Reply-To: References: Message-ID: <20190509115546.GG28897@paraplu> On Sat, May 04, 2019 at 07:19:48PM -0600, Morgan Fainberg wrote: > On Sat, May 4, 2019, 16:48 Eric Fried wrote: > > > (NB: I tagged [all] because it would be interesting to know where other > > teams stand on this issue.) > > > > Etherpad: https://etherpad.openstack.org/p/nova-ptg-train-governance [Thanks for the summary, Eric; I couldn't be at that session due to a conflict.] > > Summary: > > - There is a (currently unwritten? at least for Nova) rule that a patch > > should not be approved exclusively by cores from the same company. This > > is rife with nuance, including but not limited to: > > - Usually (but not always) relevant when the patch was proposed by > > member of same company > > - N/A for trivial things like typo fixes > > - The issue is: > > - Should the rule be abolished? and/or > > - Should the rule be written down? > > [...] > we opted to really lean on "Overall, we should be able to trust cores > to act in good faith". Indeed. IME, this is what other mature open source projects do (e.g. Linux and QEMU, which are in the "critical path" to Nova and OpenStack). FWIW, over the past six years, I've seen plenty of cases on 'qemu-devel' (the upstream development list fo the QEMU project) and on KVM list, where a (non-trivial) patch contribution from company-X is merged by maintainers from company-X. Of course, there is the implicit trust in that the contributor is acting in upstream's best interests first. (If not, course-correct and educate.) - - - This Nova "rule" (which, as Eric succintly put it, is "rife with nuance") doesn't affect me much, if at all. But allow me share my stance: I'm of course all for diverse set of opinions and reviews from different companies as much as posisble, which I consider super healthy. So long as there are no overly iron-clad "rules" that are "unbendable". What _should_ raise a red flag is a _pattern_. E.g. Developer-A from the company Pied Piper uploads a complex change, within a couple of days (or worse, even shorter), two more upstream "core" reivewers from Pied Piper, who are in the know about the change, pile on it and approve without giving sufficient time for other community reviewers to catch-up. (Because: "hey, we need to get Pied Piper's 'priority feature' into the current release, to get that one-up against the competitor!") *That* kind of behaviour should be called out and rebuked. _However_. If: - a particular (non-trivial) change is publicly advertized well-enough (2 weeks or more), for the community developers to catch up; - all necessary details, context and use cases are described clearly in the open, without any assumptions; - if you've checked with other non-Pied Piper "cores" if they have any strong opinions, and gave them the time to chime in; - if the patch receives negative comments, address it without hand-waving, explaining in _every_ detail that isn't clear to non-Pied Piper reviewers, so that in the end they can come to a clear conclusion whether it's right or not; - you are working in the best interests of upstream, _even if_ it goes against your downstream's interests; i.e. you're sincere and sensible when operating with your "upstream hat". Then, "core" reviewers from Pied Piper _should_ be able to merge a contribution from Pied Piper (or someone else), without a nauseating feeling of "you're just one 'wrong sneeze' away from being implicated of 'trust erosion'!" or any artificial "procedural blockers". Of course, this requires a "heightend sense of awareness", and doing that delicate tango of balancing "upstream" vs. "downstream" hats. And I'd like to imagine contributors and maintainers are constantly striving towards it. [...] -- /kashyap From tobias.rydberg at citynetwork.eu Thu May 9 12:01:16 2019 From: tobias.rydberg at citynetwork.eu (Tobias Rydberg) Date: Thu, 9 May 2019 14:01:16 +0200 Subject: [sigs][publiccloud][publiccloud-wg] Reminder meeting this afternoon for Public Cloud WG/SIG Message-ID: <1887c685-4404-39b9-7428-15792b87a80f@citynetwork.eu> Hi all, This is a reminder for todays meeting for the Public Cloud WG/SIG - 1400 UTC in #openstack-publiccloud. Agenda at: https://etherpad.openstack.org/p/publiccloud-wg See you all later! Cheers, Tobias -- Tobias Rydberg Senior Developer Twitter & IRC: tobberydberg www.citynetwork.eu | www.citycloud.com INNOVATION THROUGH OPEN IT INFRASTRUCTURE ISO 9001, 14001, 27001, 27015 & 27018 CERTIFIED From thierry at openstack.org Thu May 9 12:10:03 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 9 May 2019 14:10:03 +0200 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> Message-ID: Jeremy Stanley wrote: > [...] > It's still unclear to me why we're doing this at all. Our stable > constraints lists are supposed to be a snapshot in time from when we > released, modulo stable point release updates of the libraries we're > maintaining. Agreeing to bump random dependencies on stable branches > because of security vulnerabilities in them is a slippery slope > toward our users expecting the project to be on top of vulnerability > announcements for every one of the ~600 packages in our constraints > list. Deployment projects already should not depend on our > requirements team tracking security vulnerabilities, so need to have > a mechanism to override constraints entries anyway if they're making > such guarantees to their users (and I would also caution against > doing that too). > > Distributions are far better equipped than our project to handle > such tracking, as they generally get advance notice of > vulnerabilities and selectively backport fixes for them. Trying to > accomplish the same with a mix of old and new dependency versions in > our increasingly aging stable and extended maintenance branches > seems like a disaster waiting to happen. I agree it is a bit of a slippery slope... We historically did not do that (stable branches are a convenience, not a product), because it is a lot of work to track and test vulnerable dependencies across multiple stable branches in a comprehensive manner. Why update requests 2.18.4 for CVE-2018-18074, and not Jinja2 2.10.0 for CVE-2019-8341 ? I'm not sure doing it on a case-by-case basis is a good idea either, as it might set unreasonable expectations. -- Thierry Carrez (ttx) From thierry at openstack.org Thu May 9 12:24:31 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 9 May 2019 14:24:31 +0200 Subject: [tc] Proposal: restrict TC activities In-Reply-To: <20190503204942.GB28010@shipstone.jp> References: <20190503204942.GB28010@shipstone.jp> Message-ID: <0b579079-265e-c1ee-bd87-261566b1a6af@openstack.org> Emmet Hikory wrote: > [...] As such, I suggest that the Technical Committee be > restricted from actually doing anything beyond approval of merges to the > governance repository. If you look at the documented role of the TC[1], you'll see that it is mostly focused on deciding on proposed governance (or governance repository) changes. The only section that does not directly translate into governance change approval is "Ensuring a healthy, open collaboration", which is about tracking that the project still lives by its documented values, principles and rules -- activities that I think should also remain with the TC. [1] https://governance.openstack.org/tc/reference/role-of-the-tc.html Beyond that, it is true that some Technical Committee members are involved in driving other initiatives (including *proposing* governance changes), but I'd say that they do it like any other community member could. While I think we should (continue to) encourage participation in governance from other people, and ensure a healthy turnover level in TC membership, I don't think that we should *restrict* TC members from voluntarily doing things beyond approving changes. -- Thierry Carrez (ttx) From mark at stackhpc.com Thu May 9 12:26:40 2019 From: mark at stackhpc.com (Mark Goddard) Date: Thu, 9 May 2019 13:26:40 +0100 Subject: kolla-ansible pike - nova_compute containers not starting In-Reply-To: References: Message-ID: On Wed, 8 May 2019 at 16:07, Shyam Biradar wrote: > Hi, > > I am setting up all-in-one ubuntu based kolla-ansible pike openstack. > > Deployment is failing at following ansible task: > TASK [nova : include_tasks] > ********************************************************************************************************************** > included: > /root/virtnev/share/kolla-ansible/ansible/roles/nova/tasks/discover_computes.yml > for localhost > > TASK [nova : Waiting for nova-compute service up] > ************************************************************************************************ > FAILED - RETRYING: Waiting for nova-compute service up (20 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (19 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (18 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (17 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (16 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (15 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (14 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (13 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (12 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (11 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (10 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (9 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (8 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (7 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (6 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (5 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (4 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (3 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (2 retries left). > FAILED - RETRYING: Waiting for nova-compute service up (1 retries left). > fatal: [localhost -> localhost]: FAILED! => {"attempts": 20, "changed": > false, "cmd": ["docker", "exec", "kolla_toolbox", "openstack", > "--os-interface", "internal", "--os-auth-url", " > http://192.168.122.151:35357", "--os-identity-api-version", "3", > "--os-project-domain-name", "default", "--os-tenant-name", "admin", > "--os-username", "admin", "--os-password", > "ivpu1km8qxnVQESvAF4cyTFstOvrbxGUHjFF15gZ", "--os-user-domain-name", > "default", "compute", "service", "list", "-f", "json", "--service", > "nova-compute"], "delta": "0:00:02.555356", "end": "2019-05-02 > 09:24:45.485786", "rc": 0, "start": "2019-05-02 09:24:42.930430", "stderr": > "", "stderr_lines": [], "stdout": "[]", "stdout_lines": ["[]"]} > > -------------------------------------------------------------------- > > I can see following stack trace in nova-compute container log > > 4. 2019-05-02 08:21:30.522 7 INFO nova.service [-] Starting compute node > (version 16.1.7) > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service [-] Error starting > thread.: PlacementNotConfigured: This compute is not configured to talk to > the placement service. Configure the [placement] section of nova.conf and > restart the service. > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service Traceback (most > recent call last): > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File > "/var/lib/kolla/venv/local/lib/python2.7/site-packages/oslo_service/service.py", > line 721, in run_service > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service service.start() > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File > "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/service.py", > line 156, in start > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service > self.manager.init_host() > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service File > "/var/lib/kolla/venv/local/lib/python2.7/site-packages/nova/compute/manager.py", > line 1155, in init_host > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service raise > exception.PlacementNotConfigured() > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service > PlacementNotConfigured: This compute is not configured to talk to the > placement service. Configure the [placement] section of nova.conf and > restart the service. > 2019-05-02 08:21:30.524 7 ERROR oslo_service.service > 2019-05-02 08:21:59.229 7 INFO os_vif [-] Loaded VIF plugins: ovs, > linux_bridge > --------------------------------------------------------------------- > > I saw nova-compute nova.conf has [placement] section configured well and > it's same as nova_api's placement section. > Other nova containers are started well. > Hi Shyam, The nova code has this: # NOTE(sbauza): We want the compute node to hard fail if it can't be # able to provide its resources to the placement API, or it would not # be able to be eligible as a destination. if CONF.placement.os_region_name is None: raise exception.PlacementNotConfigured() Do you have the os_region_name option set in [placement] in nova.conf? > Any thoughts? > [image: logo] > *Shyam Biradar* * Software Engineer | DevOps* > M +91 8600266938 | shyam.biradar at trilio.io | trilio.io > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chkumar246 at gmail.com Thu May 9 12:30:55 2019 From: chkumar246 at gmail.com (Chandan kumar) Date: Thu, 9 May 2019 18:00:55 +0530 Subject: [tripleo][openstack-ansible] collaboration on os_tempest role update 21 - May 09, 2019 Message-ID: Hello, Here is the 21th update (Apr 24 to May 09, 2019) on collaboration on os_tempest[1] role between TripleO and OpenStack-Ansible projects. Due to Denver Train PTG, we have skipped the last week report. Highlights of Update 21: * We removed install_test_requirements flag in os_tempest as all the tempest plugins have their requirements specified in requirements.txt, so let's use that instead of using test_requirements.txt also. * All the upstream TripleO CI standalone base, scenario1-4 and puppet standalone jobs are running tempest using os_tempest. Thanks to Arx for porting jobs to os_tempest and odyssey4me for bring back the gate alive. >From Denver Train PTG: Wes made a nice os_tempest tripleo asci video: https://asciinema.org/a/rm7LDAs6RAR1xh7oQrp07LaeR OSA project update from summit: https://www.youtube.com/watch?v=JZet1uNAr_o&t=868s Things got merged: os_tempest: * Remove install_test_requirements flag - https://review.opendev.org/657778 * Temporarily set bionic job to non-voting - https://review.opendev.org/657833 os_cinder: * Set glance_api_version=2 in cinder.conf - https://review.opendev.org/653308 Tripleo: * Enable os_tempest in baremetal-full-overcloud-validate playbook - https://review.opendev.org/652983 * Set gather_facts to false while calling tempest playbook - https://review.opendev.org/653702 * Port tripleo-ci-centos-7-scenario001-standalone to os_tempest - https://review.opendev.org/655870 * Port scenario002-standalone-master to os_tempest - https://review.opendev.org/656259 * Port puppet-keystone-tripleo-standalone to os_tempest - https://review.opendev.org/656474 * Port puppet-swift-tripleo-standalone to os_tempest - https://review.opendev.org/656481 * Port puppet-nova-tripleo-standalone to os_tempest - https://review.opendev.org/656480 * Port puppet-neutron-tripleo-standalone job to os_tempest - https://review.opendev.org/656479 * Switch scenario003-standalone job to use os_tempest - https://review.opendev.org/656290 * Port scenario004-standalone-master to os_tempest - https://review.opendev.org/656291 * Port puppet-horizon-tripleo-standalone to os_tempest - https://review.opendev.org/656758 * Port puppet-cinder-tripleo-standalone to os_tempest - https://review.opendev.org/656752 * Port puppet-glance-tripleo-standalone to os_tempest - https://review.opendev.org/656753 * Port standalone job to os_tempest - https://review.opendev.org/656748 Things in progress: os_tempest: * Replace tempestconf job with aio_distro_metal-tempestconf job - https://review.opendev.org/657359 * Update openstack.org -> opendev.org - https://review.opendev.org/654942 * Make smoke tests as a default whitelist tests - https://review.opendev.org/652060 on TripleO/OSA side, we will be working on enabling heat tempest plugin support. Here is the 20th update [2]. Have queries, Feel free to ping us on #tripleo or #openstack-ansible channel. Links: [1.] http://opendev.org/openstack/openstack-ansible-os_tempest [2.] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005563.html Thanks, Chandan Kumar From witold.bedyk at suse.com Thu May 9 12:35:01 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 9 May 2019 14:35:01 +0200 Subject: [telemetry][monasca][self-healing] Team meeting agenda for tomorrow In-Reply-To: <1894ef89-ea11-0d31-4820-dc1c39ed07b7@debian.org> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> <1894ef89-ea11-0d31-4820-dc1c39ed07b7@debian.org> Message-ID: <22b57ad4-c737-cef6-b18b-775c0cb9e7a6@suse.com> > But then we need some kind of timeseries framework within OpenStack as a > whole (through an Oslo library?), What would be the requirements and the scope of this framework from your point of view? > and also we must decide on a backend. > Right now, the only serious thing we have is Gnocchi, since influxdb is > gone through the open core model. Or do you have something else to suggest? Monasca can be used as the backend. As TSDB it uses Apache Cassandra with native clustering support or InfluxDB. Monasca uses Apache Kafka as the message queue. It can replicate and partition the measurements into independent InfluxDB instances. Additionally Monasca API could act as the load balancer monitoring the healthiness of InfluxDB instances and routing the queries to the assigned shards. We want to work in Train cycle to add upstream all configuration options to allow such setup [1]. Your feedback, comments and contribution are very welcome. Cheers Witek [1] https://storyboard.openstack.org/#!/story/2005620 From jesse at odyssey4.me Thu May 9 12:38:29 2019 From: jesse at odyssey4.me (Jesse Pretorius) Date: Thu, 9 May 2019 12:38:29 +0000 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> Message-ID: > On 9 May 2019, at 13:10, Thierry Carrez wrote: > > I agree it is a bit of a slippery slope... We historically did not do that (stable branches are a convenience, not a product), because it is a lot of work to track and test vulnerable dependencies across multiple stable branches in a comprehensive manner. > > Why update requests 2.18.4 for CVE-2018-18074, and not Jinja2 2.10.0 for CVE-2019-8341 ? > > I'm not sure doing it on a case-by-case basis is a good idea either, as it might set unreasonable expectations. A lot of operators make use of u-c for source-based builds to ensure consistency in the builds and to ensure that they’re using the same packages as those which were tested upstream. It makes sense to collaborate on something this important as far upstream as possible. If we think of this as a community effort similar to the extended maintenance policy - the development community doesn’t *have* to implement the infrastructure to actively monitor for the vulnerabilities and respond to them. It can be maintained on a best effort basis by those interested in doing so. To limit the effort involved we could agree to limit the scope to only allow changes to the current ‘maintained’ releases. For all other branches we can encourage an upgrade to a ‘maintained’ release by adding a release note. To manage the 'unreasonable expectations’, we should document a policy to this effect. From fungi at yuggoth.org Thu May 9 12:43:01 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 9 May 2019 12:43:01 +0000 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <20190509083558.GB3547@hilbert.berg.ol> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> <20190509083558.GB3547@hilbert.berg.ol> Message-ID: <20190509124300.4f7d7qxprq6osasb@yuggoth.org> On 2019-05-09 10:35:58 +0200 (+0200), Matthias Runge wrote: [...] > Unfortunately, having a meetig at 4 am in the morning does not really > work for me. May I kindly request to move the meeting to a more friendly > hour? The World is round, and your "friendly" times are always someone else's "unfriendly" times. Asking the folks interested in participating in the meeting to agree on a consensus timeslot between them is fair, but please don't characterize someone else's locale as "unfriendly" just because it's on the opposite side of the planet from you. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From thierry at openstack.org Thu May 9 12:49:17 2019 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 9 May 2019 14:49:17 +0200 Subject: [tc][searchlight] What does Maintenance Mode mean for a project? In-Reply-To: References: Message-ID: <155f4110-df20-3b23-8c68-700e9c3d66f0@openstack.org> Trinh Nguyen wrote: > Currently, in the project details section of Searchlight page [1], it > says we're in the Maintenance Mode. What does that mean? and how we can > update it? Maintenance mode is a project-team tag that teams can choose to apply to themselves. It is documented at: https://governance.openstack.org/tc/reference/tags/status_maintenance-mode.html If you feel like Searchlight is back to a feature development phase, you can ask for it to be changed by proposing a change to https://opendev.org/openstack/governance/src/branch/master/reference/projects.yaml#L3407 -- Thierry Carrez (ttx) From mriedemos at gmail.com Thu May 9 13:02:47 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 9 May 2019 08:02:47 -0500 Subject: Any ideas on fixing bug 1827083 so we can merge code? Message-ID: I'm not sure what is causing the bug [1] but it's failing at a really high rate for about week now. Do we have ideas on the issue? Do we have thoughts on a workaround? Or should we disable the vexxhost-sjc1 provider until it's solved? [1] http://status.openstack.org/elastic-recheck/#1827083 -- Thanks, Matt From mriedemos at gmail.com Thu May 9 13:20:10 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 9 May 2019 08:20:10 -0500 Subject: [nova][neutron][ptg] Summary: Leaking resources when ports are deleted out-of-band In-Reply-To: <1557393539.17816.4@smtp.office365.com> References: <62ef48e0-9425-9191-a648-c1009c1032b7@fried.cc> <1556919312.16566.2@smtp.office365.com> <5f87ea30-0bdf-31a4-a3f5-0e9d201b3665@gmail.com> <1556989044.27606.0@smtp.office365.com> <1557393539.17816.4@smtp.office365.com> Message-ID: <0e10037f-f193-3752-c96e-7ffb536ea187@gmail.com> On 5/9/2019 4:19 AM, Balázs Gibizer wrote: > This sentence based on the reported bug [2]. The reason while Octavia > is unbinding the port in Neutron instead of via Nova is that Nova fails > to detach the interface and unbind the port if the nova-compute is > down. In that bug we discussing if it would be meaningful to do a local > interface detach (unvind port in neutron + deallocate port resource in > placement) in the nova-api if the compute is done similar to the local > server delete. > > [2]https://bugs.launchpad.net/nova/+bug/1827746 Oh OK I was confusing this with deleting the VM while the compute host was down, not detaching the port from the server while the compute was down. Yeah I'm not sure what we'd want to do there. We could obviously do the same thing we do for VM delete in the API while the compute host is down, but could we be leaking things on the compute host in that case if the VIF was never properly unplugged? I'd think that is already an issue for local delete of the VM in the API if the compute comes back up later (maybe there is something in the compute service on startup that will do cleanup, I'm not sure off the top of my head). -- Thanks, Matt From openstack at fried.cc Thu May 9 13:39:07 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 9 May 2019 08:39:07 -0500 Subject: [placement][nova][ptg] resource provider affinity In-Reply-To: <489d8cae-9151-5f43-b495-ad51c959a0ea@intel.com> References: <21aa22e7-be7d-8ecf-b5bd-9c6afcd789f5@fried.cc> <27624C30-2BB6-43DF-9613-783674389C0B@fried.cc> <1556631941.24201.1@smtp.office365.com> <264f10b8-05dc-5280-28af-1f29cae91821@hco.ntt.co.jp> <4aa76244-fce0-86f3-a6f5-cd7f4d8cb2f0@fried.cc> <03922b54-994e-dcae-8543-7c9c2f75b87d@hco.ntt.co.jp> <5fd214e8-4822-53a5-a7d6-622c5133a26f@fried.cc> <1CC272501B5BC543A05DB90AA509DED527557F03@fmsmsx122.amr.corp.intel.com> <1934f31d-da89-071f-d667-c36d965851ae@fried.cc> <489d8cae-9151-5f43-b495-ad51c959a0ea@intel.com> Message-ID: <145f897e-2744-25b6-596f-43c51982044e@fried.cc> Sundar- > Yes. The examples in the storyboard [1] for NUMA affinity use group > numbers. If that were recast to use named groups, and we wanted NUMA > affinity apart from device colocation, would that not require a > different name than T? In short, if you want to express 2 different > affinities/groupings, perhaps we need to use a name with 2 parts, and > use 2 different same_subtree clauses. Just pointing out the implications. That's correct. If we wanted two groupings... [repeating diagram for context] CN | +---NIC1 (trait: I_AM_A_NIC) | | | +-----PF1_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) | | | +-----PF1_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) | +---NIC2 (trait: I_AM_A_NIC) | +-----PF2_1 (trait: CUSTOM_PHYSNET1, inventory: VF=4) | +-----PF2_2 (trait: CUSTOM_PHYSNET2, inventory: VF=4) ?resources_TA1=VF:1&required_TA1=CUSTOM_PHYSNET1 &resources_TA2=VF:1&required_TA2=CUSTOM_PHYSNET2 &required_TA3=I_AM_A_NIC &same_subtree=','.join([ suffix for suffix in suffixes if suffix.startswith('_TA')]) # (i.e. '_TA1,_TA2,_TA3') &resources_TB1=VF:1&required_TB1=CUSTOM_PHYSNET1 &resources_TB2=VF:1&required_TB2=CUSTOM_PHYSNET2 &required_TB3=I_AM_A_NIC &same_subtree=','.join([ suffix for suffix in suffixes if suffix.startswith('_TB')]) # (i.e. '_TB1,_TB2,_TB3') This would give us four candidates: - One where TA* is under NIC1 and TB* is under NIC2 - One where TB* is under NIC1 and TA* is under NIC2 - One where everything is under NIC1 - One where everything is under NIC2 This of course leads to some nontrivial questions, like: - How do we express these groupings from the operator-/user-facing sources (flavor, port, device_profile, etc.)? Especially when different pieces come from different sources but still need to be affined to each other. This is helped by allowing named as opposed to autonumbered suffixes, which is why we're doing that, but it's still going to be tricky to do in practice. - What if we want to express anti-affinity, i.e. limit the response to just the first two candidates? We discussed being able to say something like same_subtree=_TA3,!_TB3, but decided to defer that design/implementation for now. If you want this kind of thing in Train, you'll have to filter post-Placement. Thanks, efried . From fungi at yuggoth.org Thu May 9 13:48:09 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 9 May 2019 13:48:09 +0000 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> Message-ID: <20190509134808.4eqwwjcdxjpt37wh@yuggoth.org> On 2019-05-09 12:38:29 +0000 (+0000), Jesse Pretorius wrote: [...] > A lot of operators make use of u-c for source-based builds to > ensure consistency in the builds and to ensure that they’re using > the same packages as those which were tested upstream. It makes > sense to collaborate on something this important as far upstream > as possible. [...] See, this is what frightens me. We should *strongly* discourage them from doing this, period. If your deployment relies on distribution packages of dependencies then your distro's package maintainers have almost certainly received advance notice of many of these vulnerabilities and have fixes ready for you to download the moment they're made public. They're in most cases selectively backporting the fixes to the versions they carry so as to make them otherwise backward compatible and avoid knock-on effects involving a need to upgrade other transitive dependencies which are not involved in the vulnerability. > If we think of this as a community effort similar to the extended > maintenance policy - the development community doesn’t *have* to > implement the infrastructure to actively monitor for the > vulnerabilities and respond to them. It can be maintained on a > best effort basis by those interested in doing so. By the time we find out and work through the transitive dependency bumps implied by this sort of change (because many of these ~600 dependencies of ours don't backport fixes or maintain multiple stable series of their own and so our only option is to upgrade to the latest version, and this brings with it removal of old features or reliance on newer versions of other transitive dependencies), we're long past public disclosure and the vulnerability has likely been getting exploited in the wild for some time. If a deployer/operator can't rely on our constraints list for having a timely and complete picture of a secure dependency tree then they already need local workarounds which are probably superior regardless. There are also plenty of non-Python dependencies for our software which can have vulnerabilities of their own, and those aren't reflected at all in our constraints lists. How are said users updating those? > To limit the effort involved we could agree to limit the scope to > only allow changes to the current ‘maintained’ releases. For all > other branches we can encourage an upgrade to a ‘maintained’ > release by adding a release note. I still think even that is an abuse of the stable upper constraints lists and in direct conflict with their purpose as a *frozen* snapshot of external dependencies contemporary with the release which allow us to maintain the stability of our test environments for our stable branches. It can't be both that *and* updated with the latest versions of some dependencies because of random bug fixes, security-related or otherwise. > To manage the 'unreasonable expectations’, we should document a > policy to this effect. What we should document is that it's unreasonable to attempt to repurpose our stable constraints lists as a security update mechanism for external dependencies, and encourage users to look elsewhere when attempting to find solutions for securing the dependency trees of their deployments. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From openstack at fried.cc Thu May 9 13:49:35 2019 From: openstack at fried.cc (Eric Fried) Date: Thu, 9 May 2019 08:49:35 -0500 Subject: Any ideas on fixing bug 1827083 so we can merge code? In-Reply-To: References: Message-ID: Have we tried changing the URI to https://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt to avoid the redirecting? On 5/9/19 8:02 AM, Matt Riedemann wrote: > I'm not sure what is causing the bug [1] but it's failing at a really > high rate for about week now. Do we have ideas on the issue? Do we have > thoughts on a workaround? Or should we disable the vexxhost-sjc1 > provider until it's solved? > > [1] http://status.openstack.org/elastic-recheck/#1827083 > From fungi at yuggoth.org Thu May 9 13:55:17 2019 From: fungi at yuggoth.org (Jeremy Stanley) Date: Thu, 9 May 2019 13:55:17 +0000 Subject: Any ideas on fixing bug 1827083 so we can merge code? In-Reply-To: References: Message-ID: <20190509135517.7j7ccyyxzp2yneun@yuggoth.org> On 2019-05-09 08:49:35 -0500 (-0500), Eric Fried wrote: > Have we tried changing the URI to > https://opendev.org/openstack/requirements/raw/branch/master/upper-constraints.txt > to avoid the redirecting? > > On 5/9/19 8:02 AM, Matt Riedemann wrote: > > I'm not sure what is causing the bug [1] but it's failing at a really > > high rate for about week now. Do we have ideas on the issue? Do we have > > thoughts on a workaround? Or should we disable the vexxhost-sjc1 > > provider until it's solved? > > > > [1] http://status.openstack.org/elastic-recheck/#1827083 I have to assume the bug report itself is misleading. Jobs should be using the on-disk copy of the requirements repository provided by Zuul for this and not retrieving that file over the network. However the problem is presumably DNS resolution not working at all on those nodes, so something is going to break at some point in the job in those cases regardless. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From witold.bedyk at suse.com Thu May 9 14:07:53 2019 From: witold.bedyk at suse.com (Witek Bedyk) Date: Thu, 9 May 2019 16:07:53 +0200 Subject: [monasca] Monasca PTG sessions summary Message-ID: <9f4aa6a9-c69b-c675-b15c-b17a80b64dde@suse.com> Hello Team, I've put together some of the items we've discussed during the PTG last week [1]. Please add or update if anything important is missing or wrong. Thanks again for all your contributions during the Summit and the PTG. Cheers Witek [1] https://wiki.openstack.org/wiki/MonascaTrainPTG From pierre-samuel.le-stang at corp.ovh.com Thu May 9 15:14:28 2019 From: pierre-samuel.le-stang at corp.ovh.com (Pierre-Samuel LE STANG) Date: Thu, 9 May 2019 17:14:28 +0200 Subject: [ops] database archiving tool Message-ID: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> Hi all, At OVH we needed to write our own tool that archive data from OpenStack databases to prevent some side effect related to huge tables (slower response time, changing MariaDB query plan) and to answer to some legal aspects. So we started to write a python tool which is called OSArchiver that I briefly presented at Denver few days ago in the "Optimizing OpenStack at large scale" talk. We think that this tool could be helpful to other and are ready to open source it, first we would like to get the opinion of the ops community about that tool. To sum-up OSArchiver is written to work regardless of Openstack project. The tool relies on the fact that soft deleted data are recognizable because of their 'deleted' column which is set to 1 or uuid and 'deleted_at' column which is set to the date of deletion. The points to have in mind about OSArchiver: * There is no knowledge of business objects * One table might be archived if it contains 'deleted' column * Children rows are archived before parents rows * A row can not be deleted if it fails to be archived Here are features already implemented: * Archive data in an other database and/or file (actually SQL and CSV formats are supported) to be easily imported * Delete data from Openstack databases * Customizable (retention, exclude DBs, exclude tables, bulk insert/delete) * Multiple archiving configuration * Dry-run mode * Easily extensible, you can add your own destination module (other file format, remote storage etc...) * Archive and/or delete only mode It also means that by design you can run osarchiver not only on OpenStack databases but also on archived OpenStack databases. Thanks in advance for your feedbacks. -- Pierre-Samuel Le Stang From mriedemos at gmail.com Thu May 9 15:28:03 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 9 May 2019 10:28:03 -0500 Subject: [nova][ptg] Summary: Extra specs validation In-Reply-To: <07673fec-c193-1031-b9f0-5d32c65cc124@fried.cc> References: <07673fec-c193-1031-b9f0-5d32c65cc124@fried.cc> Message-ID: <17e7e0f8-4604-a845-8749-738f588374c1@gmail.com> On 5/2/2019 11:11 PM, Eric Fried wrote: > - Do it in the flavor API when extra specs are set (as opposed to e.g. > during server create) > - One spec, but two stages: > 1) For known keys, validate values; do this without a microversion. > 2) Validate keys, which entails > - Standard set of keys (by pattern) known to nova > - Mechanism for admin to extend the set for snowflake extra specs > specific to their deployment / OOT driver / etc. > - "Validation" will at least comprise messaging/logging. > - Optional "strict mode" making the operation fail is also a possibility. I don't remember agreeing to one spec with two stages for this. If you want to get something approved in workable in Train, validating the values for known keys is low-hanging-fruit. Figuring out how to validate known keys in a way that allows out of tree extra specs to work is going to be a lot more complicated and rat-holey, so I would personally make those separate efforts and separate specs. -- Thanks, Matt From moreira.belmiro.email.lists at gmail.com Thu May 9 15:43:49 2019 From: moreira.belmiro.email.lists at gmail.com (Belmiro Moreira) Date: Thu, 9 May 2019 17:43:49 +0200 Subject: [ops] database archiving tool In-Reply-To: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> References: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> Message-ID: Hi Pierre-Samuel, at this point most of the OpenStack projects have their own way to archive/delete soft deleted records. But one thing usually missing is the retention period of soft deleted records and then the archived data. I'm interested to learn more about what you are doing. Is there any link to access the code? Belmiro CERN On Thu, May 9, 2019 at 5:25 PM Pierre-Samuel LE STANG < pierre-samuel.le-stang at corp.ovh.com> wrote: > Hi all, > > At OVH we needed to write our own tool that archive data from OpenStack > databases to prevent some side effect related to huge tables (slower > response > time, changing MariaDB query plan) and to answer to some legal aspects. > > So we started to write a python tool which is called OSArchiver that I > briefly > presented at Denver few days ago in the "Optimizing OpenStack at large > scale" > talk. We think that this tool could be helpful to other and are ready to > open > source it, first we would like to get the opinion of the ops community > about > that tool. > > To sum-up OSArchiver is written to work regardless of Openstack project. > The > tool relies on the fact that soft deleted data are recognizable because of > their 'deleted' column which is set to 1 or uuid and 'deleted_at' column > which > is set to the date of deletion. > > The points to have in mind about OSArchiver: > * There is no knowledge of business objects > * One table might be archived if it contains 'deleted' column > * Children rows are archived before parents rows > * A row can not be deleted if it fails to be archived > > Here are features already implemented: > * Archive data in an other database and/or file (actually SQL and CSV > formats are supported) to be easily imported > * Delete data from Openstack databases > * Customizable (retention, exclude DBs, exclude tables, bulk insert/delete) > * Multiple archiving configuration > * Dry-run mode > * Easily extensible, you can add your own destination module (other file > format, remote storage etc...) > * Archive and/or delete only mode > > It also means that by design you can run osarchiver not only on OpenStack > databases but also on archived OpenStack databases. > > Thanks in advance for your feedbacks. > > -- > Pierre-Samuel Le Stang > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mthode at mthode.org Thu May 9 15:54:55 2019 From: mthode at mthode.org (Matthew Thode) Date: Thu, 9 May 2019 10:54:55 -0500 Subject: [all][requirements][stable] requests version bump on stable brances {pike|queens} for CVE-2018-18074 In-Reply-To: <20190509134808.4eqwwjcdxjpt37wh@yuggoth.org> References: <20190507203022.ctlwkqh4awa5z3ez@mthode.org> <20190508142758.gbio47mo3f7pfpgz@yuggoth.org> <20190509134808.4eqwwjcdxjpt37wh@yuggoth.org> Message-ID: <20190509155455.7wkszge3e7bykgsj@mthode.org> On 19-05-09 13:48:09, Jeremy Stanley wrote: > On 2019-05-09 12:38:29 +0000 (+0000), Jesse Pretorius wrote: > [...] > > A lot of operators make use of u-c for source-based builds to > > ensure consistency in the builds and to ensure that they’re using > > the same packages as those which were tested upstream. It makes > > sense to collaborate on something this important as far upstream > > as possible. > [...] > > See, this is what frightens me. We should *strongly* discourage them > from doing this, period. If your deployment relies on distribution > packages of dependencies then your distro's package maintainers have > almost certainly received advance notice of many of these > vulnerabilities and have fixes ready for you to download the moment > they're made public. They're in most cases selectively backporting > the fixes to the versions they carry so as to make them otherwise > backward compatible and avoid knock-on effects involving a need to > upgrade other transitive dependencies which are not involved in the > vulnerability. > To extend on this, I thought that OSA had the ability to override certian constraints (meaning they could run the check and maintain the overrides on their end). > > If we think of this as a community effort similar to the extended > > maintenance policy - the development community doesn’t *have* to > > implement the infrastructure to actively monitor for the > > vulnerabilities and respond to them. It can be maintained on a > > best effort basis by those interested in doing so. > > By the time we find out and work through the transitive dependency > bumps implied by this sort of change (because many of these ~600 > dependencies of ours don't backport fixes or maintain multiple > stable series of their own and so our only option is to upgrade to > the latest version, and this brings with it removal of old features > or reliance on newer versions of other transitive dependencies), > we're long past public disclosure and the vulnerability has likely > been getting exploited in the wild for some time. If a > deployer/operator can't rely on our constraints list for having a > timely and complete picture of a secure dependency tree then they > already need local workarounds which are probably superior > regardless. There are also plenty of non-Python dependencies for our > software which can have vulnerabilities of their own, and those > aren't reflected at all in our constraints lists. How are said users > updating those? > There's also the problem for knock on dependencies. Update foo, which pulls in a new version of bar as required. Either of which can break the world (and on down the dep tree) -- Matthew Thode -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From joseph.davis at suse.com Thu May 9 15:57:03 2019 From: joseph.davis at suse.com (Joseph Davis) Date: Thu, 9 May 2019 08:57:03 -0700 Subject: [telemetry] Team meeting agenda for tomorrow In-Reply-To: <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> References: <14ff728c-f19e-e869-90b1-4ff37f7170af@suse.com> <20AC2324-24B6-40D1-A0A4-0382BCE430A7@cern.ch> Message-ID: Hi Tim, I added your question as Proposal C to the roadmap etherpad [1]. Feel free to change it if I got something wrong. :) [1]https://etherpad.openstack.org/p/telemetry-train-roadmap joseph On 5/9/19 12:24 AM, Tim Bell wrote: > > Is it time to rethink the approach to telemetry a bit? > > Having each project provide its telemetry data (such as Swift with > statsd - > https://docs.openstack.org/swift/latest/admin/objectstorage-monitoring.html > > or using a framework like Prometheus)? > > In the end, the projects are the ones who have the best knowledge of > how to get the metrics. > > Tim > ** -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 9 19:50:45 2019 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 9 May 2019 14:50:45 -0500 Subject: [ops] database archiving tool In-Reply-To: References: <20190509151428.im2c6dbxpv6hwhyo@corp.ovh.com> Message-ID: <70050fb8-9d5f-b39c-a46a-af40e8a83ee5@gmail.com> On 5/9/2019 10:43 AM, Belmiro Moreira wrote: > But one thing usually missing is the retention period of soft deleted > records and then the archived data. Something like this? https://review.opendev.org/#/c/556751/ -- Thanks, Matt From cgoncalves at redhat.com Thu May 9 20:59:38 2019 From: cgoncalves at redhat.com (Carlos Goncalves) Date: Thu, 9 May 2019 22:59:38 +0200 Subject: [User-committee] OpenStack User Survey 2019 In-Reply-To: <5CD34F85.9010604@openstack.org> References: <5CC0732E.8020601@tipit.net> <74F9B988-972B-422F-94D1-E62A83FD87A7@openstack.org> <5CD34F85.9010604@openstack.org> Message-ID: Thank you for the prompt replies and action, Allison and Jimmy! After discussing on #openstack-lbaas with the team and drafting on https://etherpad.openstack.org/p/cItdtzi32r, we would like to suggest presenting some multiple choices along with an "Other" free text area. 1. Which OpenStack load balancing (Octavia) provider drivers would you like to see supported? (sorted alphabetically) A10 Networks AVI Networks Amphora Brocade F5 HAProxy Technologies Kemp Netscaler OVN Radware VMware Other (free text area) 2. Which new features would you like to see supported in OpenStack load balancing (Octavia)? (sorted alphabetically) Active-active Container-based amphora driver Event notifications gRPC protocol HTTP/2 protocol Log offloading MySQL protocol Simultaneous IPv4 and IPv6 VIP Statistics (more metrics) VIP ACL API Other (free text area) Thanks, Carlos On Wed, May 8, 2019 at 11:52 PM Jimmy McArthur wrote: > > Carlos, > > Right now these questions are up as free text area. Feel free to send along adjustments if you'd like. > > > > Cheers > Jimmy > > Allison Price May 8, 2019 at 4:30 PM > Hi Carlos, > > Thank you for providing these two questions. We can get them both added, but I did have a question. Are both of these questions intended to be open ended with a text box for respondents to fill in their answers? Or do you want to provide answer choices? (thinking for the first question in particular) With any multiple choice question, an Other option can be included that will trigger a text box to be completed. > > Thanks! >