From natsume.takashi at lab.ntt.co.jp Tue May 1 01:02:48 2018 From: natsume.takashi at lab.ntt.co.jp (Takashi Natsume) Date: Tue, 1 May 2018 10:02:48 +0900 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function Message-ID: Hi everyone, I'm going to add the aborting cold migration function [1] in nova. I would like to ask operators' feedback on this. The cold migration is an administrator operation by default. If administrators perform cold migration and it is stalled out, users cannot do their operations (e.g. starting the VM). In that case, if administrators can abort the cold migration by using this function, it enables users to operate their VMs. If you are a person like the following, would you reply to this mail? * Those who need this function * Those who will use this function if it is implemented * Those who think that it is better to have this function * Those who are interested in this function [1] https://review.openstack.org/#/c/334732/ Regards, Takashi Natsume NTT Software Innovation Center E-mail: natsume.takashi at lab.ntt.co.jp From dh3 at sanger.ac.uk Tue May 1 08:30:33 2018 From: dh3 at sanger.ac.uk (Dave Holland) Date: Tue, 1 May 2018 09:30:33 +0100 Subject: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey In-Reply-To: References: Message-ID: <20180501083033.GF9259@sanger.ac.uk> On Mon, Apr 30, 2018 at 12:41:21PM -0400, Mathieu Gagné wrote: > Weighers for baremetal cells: > * ReservedHostForTenantWeigher [7] ... > [7] Used to favor reserved host over non-reserved ones based on project. Hello Mathieu, we are considering writing something like this, for virtual machines not for baremetal. Our use case is that a project buying some compute hardware is happy for others to use it, but when the compute "owner" wants sole use of it, other projects' instances must be migrated off or killed; a scheduler weigher like this might help us to minimise the number of instances needing migration or termination at that point. Would you be willing to share your source code please? thanks, Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From Tim.Bell at cern.ch Tue May 1 13:10:56 2018 From: Tim.Bell at cern.ch (Tim Bell) Date: Tue, 1 May 2018 13:10:56 +0000 Subject: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey In-Reply-To: <20180501083033.GF9259@sanger.ac.uk> References: <20180501083033.GF9259@sanger.ac.uk> Message-ID: You may also need something like pre-emptible instances to arrange the clean up of opportunistic VMs when the owner needs his resources back. Some details on the early implementation at http://openstack-in-production.blogspot.fr/2018/02/maximizing-resource-utilization-with.html. If you're in Vancouver, we'll be having a Forum session on this (https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21787/pre-emptible-instances-the-way-forward) and notes welcome on the etherpad (https://etherpad.openstack.org/p/YVR18-pre-emptible-instances) It would be good to find common implementations since this is a common scenario in the academic and research communities. Tim -----Original Message----- From: Dave Holland Date: Tuesday, 1 May 2018 at 10:40 To: Mathieu Gagné Cc: "OpenStack Development Mailing List (not for usage questions)" , openstack-operators Subject: Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey On Mon, Apr 30, 2018 at 12:41:21PM -0400, Mathieu Gagné wrote: > Weighers for baremetal cells: > * ReservedHostForTenantWeigher [7] ... > [7] Used to favor reserved host over non-reserved ones based on project. Hello Mathieu, we are considering writing something like this, for virtual machines not for baremetal. Our use case is that a project buying some compute hardware is happy for others to use it, but when the compute "owner" wants sole use of it, other projects' instances must be migrated off or killed; a scheduler weigher like this might help us to minimise the number of instances needing migration or termination at that point. Would you be willing to share your source code please? thanks, Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From emilien at redhat.com Tue May 1 14:03:55 2018 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 1 May 2018 07:03:55 -0700 Subject: [Openstack-operators] [openstack-dev] The Forum Schedule is now live In-Reply-To: References: <5AE34A02.8020802@openstack.org> <5AE73AA3.4030408@openstack.org> <5AE74CF2.9010804@openstack.org> Message-ID: On Mon, Apr 30, 2018 at 10:25 AM, Emilien Macchi wrote: > On Mon, Apr 30, 2018 at 10:05 AM, Jimmy McArthur > wrote: >> >> It looks like we have a spot held for you, but did not receive >> confirmation that TripleO would be moving forward with Project Update. If >> you all will be recording this, we have you down for Wednesday from 11:25 - >> 11:45am. Just let me know and I'll get it up on the schedule. >> > > This slot is perfect, and I'll run it with one of my tripleo co-workers > (Alex won't be here). > Jimmy, could you please confirm we have the TripleO Project Updates slot? I don't see it in the schedule. Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Tue May 1 14:18:16 2018 From: jimmy at openstack.org (Jimmy McArthur) Date: Tue, 01 May 2018 09:18:16 -0500 Subject: [Openstack-operators] [openstack-dev] The Forum Schedule is now live In-Reply-To: References: <5AE34A02.8020802@openstack.org> <5AE73AA3.4030408@openstack.org> <5AE74CF2.9010804@openstack.org> Message-ID: <5AE87728.1020804@openstack.org> Apologies for the delay, Emilien! I should be adding it today, but it's definitely yours. > Emilien Macchi > May 1, 2018 at 9:03 AM > > Jimmy, could you please confirm we have the TripleO Project Updates > slot? I don't see it in the schedule. > > Thanks, > -- > Emilien Macchi > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Emilien Macchi > April 30, 2018 at 12:25 PM > > This slot is perfect, and I'll run it with one of my tripleo > co-workers (Alex won't be here). > > Thanks, > -- > Emilien Macchi > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Jimmy McArthur > April 30, 2018 at 12:05 PM > Alex, > > It looks like we have a spot held for you, but did not receive > confirmation that TripleO would be moving forward with Project > Update. If you all will be recording this, we have you down for > Wednesday from 11:25 - 11:45am. Just let me know and I'll get it up > on the schedule. > > Thanks! > Jimmy > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Alex Schultz > April 30, 2018 at 11:52 AM > On Mon, Apr 30, 2018 at 9:47 AM, Jimmy McArthur wrote: >> Project Updates are in their own track: >> https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=223 >> > > TripleO is still missing? > > Thanks, > -Alex > >> As are SIG, BoF and Working Groups: >> https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=218 >> >> Amy Marrich >> April 30, 2018 at 10:44 AM >> Emilien, >> >> I believe that the Project Updates are separate from the Forum? I know I saw >> some in the schedule before the Forum submittals were even closed. Maybe >> contact speaker support or Jimmy will answer here. >> >> Thanks, >> >> Amy (spotz) >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> Emilien Macchi >> April 30, 2018 at 10:33 AM >> >> >>> Hello all - >>> >>> Please take a look here for the posted Forum schedule: >>> https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=224 >>> You should also see it update on your Summit App. >> Why TripleO doesn't have project update? >> Maybe we could combine it with TripleO - Project Onboarding if needed but it >> would be great to have it advertised as a project update! >> >> Thanks, >> -- >> Emilien Macchi >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> Jimmy McArthur >> April 27, 2018 at 11:04 AM >> Hello all - >> >> Please take a look here for the posted Forum schedule: >> https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=224 >> You should also see it update on your Summit App. >> >> Thank you and see you in Vancouver! >> Jimmy >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> >> >> >> __________________________________________________________________________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > Jimmy McArthur > April 30, 2018 at 10:47 AM > Project Updates are in their own track: > https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=223 > > As are SIG, BoF and Working Groups: > https://www.openstack.org/summit/vancouver-2018/summit-schedule#track=218 > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: From mgagne at calavera.ca Tue May 1 14:35:13 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Tue, 1 May 2018 10:35:13 -0400 Subject: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey In-Reply-To: <20180501083033.GF9259@sanger.ac.uk> References: <20180501083033.GF9259@sanger.ac.uk> Message-ID: Hi Dave, On Tue, May 1, 2018 at 4:30 AM, Dave Holland wrote: > On Mon, Apr 30, 2018 at 12:41:21PM -0400, Mathieu Gagné wrote: >> Weighers for baremetal cells: >> * ReservedHostForTenantWeigher [7] > ... >> [7] Used to favor reserved host over non-reserved ones based on project. > > Hello Mathieu, > > we are considering writing something like this, for virtual machines not > for baremetal. Our use case is that a project buying some compute > hardware is happy for others to use it, but when the compute "owner" > wants sole use of it, other projects' instances must be migrated off or > killed; a scheduler weigher like this might help us to minimise the > number of instances needing migration or termination at that point. > Would you be willing to share your source code please? > I'm not sure how battle-tested this code is to be honest but here it is: https://gist.github.com/mgagne/659ca02e63779802de6f7aec8cda612a I had to merge 2 files in one (the weigher and the conf) so I'm not sure if it still works but I think you will get the idea. To use it, you need to define the "reserved_for_tenant_id" Ironic node property with the project ID to reserve it. (through Ironic API) This code also assumes you already filtered out hosts which are reserved for a different tenant. I included that code in the gist too. On a side note, our technicians generally use the forced host feature of Nova to target specific Ironic nodes: https://docs.openstack.org/nova/pike/admin/availability-zones.html But if the customer buys and reserves some machines, he should get them first before the ones in the "public pool". -- Mathieu From mihalis68 at gmail.com Tue May 1 15:37:33 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 1 May 2018 11:37:33 -0400 Subject: [Openstack-operators] ops meetups team : IRC meeting 2018-5-1 Message-ID: Lively meeting today on IRC. Minutes and log are here: Meeting ended Tue May 1 15:01:25 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4) 11:01 AM Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-01-14.00.html 11:01 AM Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-01-14.00.txt 11:01 AM Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-01-14.00.log.html We mostly focused on the upcoming PTG, see https://www.mail-archive.com/openstack-operators at lists.openstack.org/msg10021.html https://www.openstack.org/ptg As a reminder Early Bird: USD $199 (Deadline May 11 at 6:59 UTC) Regular: USD $399 (Deadline August 23 at 6:59 UTC) Late/Onsite: USD $599 Cheers Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jimmy at openstack.org Tue May 1 20:20:49 2018 From: jimmy at openstack.org (Jimmy McArthur) Date: Tue, 01 May 2018 15:20:49 -0500 Subject: [Openstack-operators] OpenStack PTG Update Message-ID: <5AE8CC21.1070504@openstack.org> Hello Ops Folks - Wanted to reach out regarding some concerns that have been voiced around the pricing at the PTG. Part of the value of the event is allowing Ops and Devs to co-mingle, collaborate, and work together on solving problems with OpenStack. What we are betting on is the opportunity to show that Ops and Devs together, can make a better OpenStack. While this new price may be higher per attendee than previous ops meetups, attendees do receive a free ticket to the next two Summits. The result is an increased price for Ops Meetup/PTG, while lowering overall attendee costs for the PTG/Ops Meetup + Summits. Additionally, we are going to extend the deadline of the Early Bird offer to May 18, 6:59 UTC. After that time, the price will increase from USD $199 to USD $399. Please keep in mind that the OpenStack Foundation doesn’t profit on these events. Our goal is to provide the absolute best community experience/opportunity/value for the money. In short, we want and need you there! If you are concerned about cost and your organization will not fund your travel, you can apply for Travel Support . If your organization is interested in sponsoring the PTG or supporting attendees through Travel Support, please email ptg at openstack.org . I'm sure there will be plenty of questions. We are happy to host a video conference if it's something that would be of value to the Ops community. Thank you and we look forward to seeing you in Denver! Jimmy -------------- next part -------------- An HTML attachment was scrubbed... URL: From arvindn05 at gmail.com Tue May 1 22:26:58 2018 From: arvindn05 at gmail.com (Arvind N) Date: Tue, 1 May 2018 15:26:58 -0700 Subject: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild In-Reply-To: References: <221636a9-4b8f-1098-10b8-2240a7cb0ff7@gmail.com> <8eec45ab-f9ed-cd96-51a1-9be78849fb9b@gmail.com> <530903a4-701d-595e-acc3-05369697cf06@gmail.com> Message-ID: Reminder for Operators, Please provide feedback either way. In cases of rebuilding of an instance using a different image where the image traits have changed between the original launch and the rebuild, is it reasonable to ask to just re-launch a new instance with the new image? The argument for this approach is that given that the requirements have changed, we want the scheduler to pick and allocate the appropriate host for the instance. The approach above also gives you consistent results vs the other approaches where the rebuild may or may not succeed depending on how the original allocation of resources went. For example(from Alex Xu) ,if you launched an instance on a host which has two SRIOV nic. One is normal SRIOV nic(A), another one with some kind of offload feature(B). So, the original request is: resources=SRIOV_VF:1 The instance gets a VF from the normal SRIOV nic(A). But with a new image, the new request is: resources=SRIOV_VF:1 traits=HW_NIC_OFFLOAD_XX With all the solutions discussed in the thread, a rebuild request like above may or may not succeed depending on whether during the initial launch whether nic A or nic B was allocated. Remember that in rebuild new allocation don't happen, we have to reuse the existing allocations. Given the above background, there seems to be 2 competing options. 1. Fail in the API saying you can't rebuild with a new image with new required traits. 2. Look at the current allocations for the instance and try to match the new requirement from the image with the allocations. With #1, we get consistent results in regards to how rebuilds are treated when the image traits changed. With #2, the rebuild may or may not succeed, depending on how well the original allocations match up with the new requirements. #2 will also need to need to account for handling preferred traits or granular resource traits if we decide to implement them for images at some point... [1] https://specs.openstack.org/openstack/nova-specs/specs/rocky/approved/glance-image-traits.html [2] https://review.openstack.org/#/c/560718/ On Tue, Apr 24, 2018 at 6:26 AM, Sylvain Bauza wrote: > Sorry folks for the late reply, I'll try to also weigh in the Gerrit > change. > > On Tue, Apr 24, 2018 at 2:55 PM, Jay Pipes wrote: > >> On 04/23/2018 05:51 PM, Arvind N wrote: >> >>> Thanks for the detailed options Matt/eric/jay. >>> >>> Just few of my thoughts, >>> >>> For #1, we can make the explanation very clear that we rejected the >>> request because the original traits specified in the original image and the >>> new traits specified in the new image do not match and hence rebuild is not >>> supported. >>> >> >> I believe I had suggested that on the spec amendment patch. Matt had >> concerns about an error message being a poor user experience (I don't >> necessarily disagree with that) and I had suggested a clearer error message >> to try and make that user experience slightly less sucky. >> >> For #3, >>> >>> Even though it handles the nested provider, there is a potential issue. >>> >>> Lets say a host with two SRIOV nic. One is normal SRIOV nic(VF1), >>> another one with some kind of offload feature(VF2).(Described by alex) >>> >>> Initial instance launch happens with VF:1 allocated, rebuild launches >>> with modified request with traits=HW_NIC_OFFLOAD_X, so basically we want >>> the instance to be allocated VF2. >>> >>> But the original allocation happens against VF1 and since in rebuild the >>> original allocations are not changed, we have wrong allocations. >>> >> >> Yep, that is certainly an issue. The only solution to this that I can see >> would be to have the conductor ask the compute node to do the pre-flight >> check. The compute node already has the entire tree of providers, their >> inventories and traits, along with information about providers that share >> resources with the compute node. It has this information in the >> ProviderTree object in the reportclient that is contained in the compute >> node resource tracker. >> >> The pre-flight check, if run on the compute node, would be able to grab >> the allocation records for the instance and determine if the required >> traits for the new image are present on the actual resource providers >> allocated against for the instance (and not including any child providers >> not allocated against). >> >> > Yup, that. We also have pre-flight checks for move operations like live > and cold migrations, and I'd really like to keep all the conditionals in > the conductor, because it knows better than the scheduler which operation > is asked. > I'm not really happy with adding more in the scheduler about "yeah, it's a > rebuild, so please do something exceptional", and I'm also not happy with > having a filter (that can be disabled) calling the Placement API. > > >> Or... we chalk this up as a "too bad" situation and just either go with >> option #1 or simply don't care about it. > > > Also, that too. Maybe just provide an error should be enough, nope? > Operators, what do you think ? (cross-calling openstack-operators@) > > -Sylvain > > >> >> Best, >> -jay >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib >> e >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > -- Arvind N -------------- next part -------------- An HTML attachment was scrubbed... URL: From emilien at redhat.com Wed May 2 05:05:13 2018 From: emilien at redhat.com (Emilien Macchi) Date: Tue, 1 May 2018 22:05:13 -0700 Subject: [Openstack-operators] [openstack-dev] The Forum Schedule is now live In-Reply-To: <5AE87728.1020804@openstack.org> References: <5AE34A02.8020802@openstack.org> <5AE73AA3.4030408@openstack.org> <5AE74CF2.9010804@openstack.org> <5AE87728.1020804@openstack.org> Message-ID: On Tue, May 1, 2018 at 7:18 AM, Jimmy McArthur wrote: > Apologies for the delay, Emilien! I should be adding it today, but it's > definitely yours. > Could we change the title of the slot and actually be a TripleO Project Update session? It would have been great to have the onboarding session but I guess we also have 2 other sessions where we'll have occasions to meet: TripleO Ops and User feedback and TripleO and Ansible integration If it's possible to still have an onboarding session, awesome otherwise it's ok I think we'll deal with it. Thanks, -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Wed May 2 08:52:24 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Wed, 02 May 2018 08:52:24 +0000 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: References: Message-ID: As an operator dealing with platforms that do cold migration I would like to be able to abort and rollback the process. That would give us a better service quality and availability. We do have no choices but to use cold migration on some of our remote sites as they don’t get a unified storage such as CEPH for cost management. Those remote sites have to growth and gain traction before being budgeted for a truly powerful distributed storage backend. Due to such limitations I would love to be able to reduce the time our customers are impacted by such move while doing maintenance or any other jobs requiring us to do a migration. Thanks for the hard work on this topic! Le mar. 1 mai 2018 à 03:03, Takashi Natsume a écrit : > Hi everyone, > > I'm going to add the aborting cold migration function [1] in nova. > I would like to ask operators' feedback on this. > > The cold migration is an administrator operation by default. > If administrators perform cold migration and it is stalled out, > users cannot do their operations (e.g. starting the VM). > > In that case, if administrators can abort the cold migration by using > this function, > it enables users to operate their VMs. > > If you are a person like the following, would you reply to this mail? > > * Those who need this function > * Those who will use this function if it is implemented > * Those who think that it is better to have this function > * Those who are interested in this function > > [1] https://review.openstack.org/#/c/334732/ > > Regards, > Takashi Natsume > NTT Software Innovation Center > E-mail: natsume.takashi at lab.ntt.co.jp > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dh3 at sanger.ac.uk Wed May 2 09:57:38 2018 From: dh3 at sanger.ac.uk (Dave Holland) Date: Wed, 2 May 2018 10:57:38 +0100 Subject: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey In-Reply-To: References: <20180501083033.GF9259@sanger.ac.uk> Message-ID: <20180502095738.GM9259@sanger.ac.uk> Thanks Tim, pre-emptible instances are definitely of interest too. I'll be in Vancouver, hope to meet up at some point. And thanks Mathieu for sharing the code, if we build anything of wider interest I'll try to get it shared. Cheers, Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** On Tue, May 01, 2018 at 01:10:56PM +0000, Tim Bell wrote: > You may also need something like pre-emptible instances to arrange the clean up of opportunistic VMs when the owner needs his resources back. Some details on the early implementation at http://openstack-in-production.blogspot.fr/2018/02/maximizing-resource-utilization-with.html. > > If you're in Vancouver, we'll be having a Forum session on this (https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21787/pre-emptible-instances-the-way-forward) and notes welcome on the etherpad (https://etherpad.openstack.org/p/YVR18-pre-emptible-instances) > > It would be good to find common implementations since this is a common scenario in the academic and research communities. > > Tim > > -----Original Message----- > From: Dave Holland > Date: Tuesday, 1 May 2018 at 10:40 > To: Mathieu Gagné > Cc: "OpenStack Development Mailing List (not for usage questions)" , openstack-operators > Subject: Re: [Openstack-operators] [openstack-dev] [nova] Default scheduler filters survey > > On Mon, Apr 30, 2018 at 12:41:21PM -0400, Mathieu Gagné wrote: > > Weighers for baremetal cells: > > * ReservedHostForTenantWeigher [7] > ... > > [7] Used to favor reserved host over non-reserved ones based on project. > > Hello Mathieu, > > we are considering writing something like this, for virtual machines not > for baremetal. Our use case is that a project buying some compute > hardware is happy for others to use it, but when the compute "owner" > wants sole use of it, other projects' instances must be migrated off or > killed; a scheduler weigher like this might help us to minimise the > number of instances needing migration or termination at that point. > Would you be willing to share your source code please? > > thanks, > Dave > -- > ** Dave Holland ** Systems Support -- Informatics Systems Group ** > ** 01223 496923 ** Wellcome Sanger Institute, Hinxton, UK ** > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is 215 Euston Road, London, NW1 2BE. > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From jimmy at openstack.org Wed May 2 12:19:24 2018 From: jimmy at openstack.org (Jimmy McArthur) Date: Wed, 02 May 2018 07:19:24 -0500 Subject: [Openstack-operators] [openstack-dev] The Forum Schedule is now live In-Reply-To: References: <5AE34A02.8020802@openstack.org> <5AE73AA3.4030408@openstack.org> <5AE74CF2.9010804@openstack.org> <5AE87728.1020804@openstack.org> Message-ID: <5AE9ACCC.4010200@openstack.org> Emilien Macchi wrote: > Could we change the title of the slot and actually be a TripleO > Project Update session? > It would have been great to have the onboarding session but I guess we > also have 2 other sessions where we'll have occasions to meet: > TripleO Ops and User feedback and TripleO and Ansible integration > > If it's possible to still have an onboarding session, awesome > otherwise it's ok I think we'll deal with it. No problem, we have both on the schedule. I moved the Project Update to 11-11:20 so you can have a few minutes before the Onboarding starts at 11:50. https://www.openstack.org/summit/vancouver-2018/summit-schedule/global-search?t=TripleO Let me know if I can assist further. Thanks! Jimmy From emilien at redhat.com Wed May 2 12:53:12 2018 From: emilien at redhat.com (Emilien Macchi) Date: Wed, 2 May 2018 05:53:12 -0700 Subject: [Openstack-operators] [openstack-dev] The Forum Schedule is now live In-Reply-To: <5AE9ACCC.4010200@openstack.org> References: <5AE34A02.8020802@openstack.org> <5AE73AA3.4030408@openstack.org> <5AE74CF2.9010804@openstack.org> <5AE87728.1020804@openstack.org> <5AE9ACCC.4010200@openstack.org> Message-ID: On Wed, May 2, 2018 at 5:19 AM, Jimmy McArthur wrote: > > No problem, we have both on the schedule. I moved the Project Update to > 11-11:20 so you can have a few minutes before the Onboarding starts at > 11:50. > > https://www.openstack.org/summit/vancouver-2018/summit-sched > ule/global-search?t=TripleO > > Let me know if I can assist further. > Everything looks excellent to me now. Thanks for your help! -- Emilien Macchi -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 2 14:07:02 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 09:07:02 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild In-Reply-To: References: <221636a9-4b8f-1098-10b8-2240a7cb0ff7@gmail.com> <8eec45ab-f9ed-cd96-51a1-9be78849fb9b@gmail.com> <530903a4-701d-595e-acc3-05369697cf06@gmail.com> Message-ID: <30e8e58b-a2f0-df83-49ba-d4d7a9aeddf3@gmail.com> On 5/1/2018 5:26 PM, Arvind N wrote: > In cases of rebuilding of an instance using a different image where the > image traits have changed between the original launch and the rebuild, > is it reasonable to ask to just re-launch a new instance with the new image? > > The argument for this approach is that given that the requirements have > changed, we want the scheduler to pick and allocate the appropriate host > for the instance. We don't know if the requirements have changed with the new image until we check them. Here is another option: What if the API compares the original image required traits against the new image required traits, and if the new image has required traits which weren't in the original image, then (punt) fail in the API? Then you would at least have a chance to rebuild with a new image that has required traits as long as those required traits are less than or equal to the originally validated traits for the host on which the instance is currently running. > > The approach above also gives you consistent results vs the other > approaches where the rebuild may or may not succeed depending on how the > original allocation of resources went. > Consistently frustrating, I agree. :) Because as a user, I can rebuild with some images (that don't have required traits) and can't rebuild with other images (that do have required traits). I see no difference with this and being able to rebuild (with a new image) some instances (image-backed) and not others (volume-backed). Given that, I expect if we punt on this, someone will just come along asking for the support later. Could be a couple of years from now when everyone has moved on and it then becomes someone else's problem. > For example(from Alex Xu) ,if you launched an instance on a host which > has two SRIOV nic. One is normal SRIOV nic(A), another one with some > kind of offload feature(B). > > So, the original request is: resources=SRIOV_VF:1 The instance gets a VF > from the normal SRIOV nic(A). > > But with a new image, the new request is: resources=SRIOV_VF:1 > traits=HW_NIC_OFFLOAD_XX > > With all the solutions discussed in the thread, a rebuild request like > above may or may not succeed depending on whether during the initial > launch whether nic A or nic B was allocated. > > Remember that in rebuild new allocation don't happen, we have to reuse > the existing allocations. > > Given the above background, there seems to be 2 competing options. > > 1. Fail in the API saying you can't rebuild with a new image with new > required traits. > > 2. Look at the current allocations for the instance and try to match the > new requirement from the image with the allocations. > > With #1, we get consistent results in regards to how rebuilds are > treated when the image traits changed. > > With #2, the rebuild may or may not succeed, depending on how well the > original allocations match up with the new requirements. > > #2 will also need to need to account for handling preferred traits or > granular resource traits if we decide to implement them for images at > some point... Option 10: Don't support image-defined traits at all. I know that won't happen though. At this point I'm exhausted with this entire issue and conversation and will probably bow out and need someone else to step in with different perspective, like melwitt or dansmith. All of the solutions are bad in their own way, either because they add technical debt and poor user experience, or because they make rebuild more complicated and harder to maintain for the developers. -- Thanks, Matt From arvindn05 at gmail.com Wed May 2 16:16:23 2018 From: arvindn05 at gmail.com (Arvind N) Date: Wed, 2 May 2018 09:16:23 -0700 Subject: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild In-Reply-To: <30e8e58b-a2f0-df83-49ba-d4d7a9aeddf3@gmail.com> References: <221636a9-4b8f-1098-10b8-2240a7cb0ff7@gmail.com> <8eec45ab-f9ed-cd96-51a1-9be78849fb9b@gmail.com> <530903a4-701d-595e-acc3-05369697cf06@gmail.com> <30e8e58b-a2f0-df83-49ba-d4d7a9aeddf3@gmail.com> Message-ID: > What if the API compares the original image required traits against the new image required traits, and if the new image has required traits which weren't in the original image, then (punt) fail in the API? Then you would at least have a chance > to rebuild with a new image that has required traits as long as those required traits are less than or equal to the originally validated traits for the host on which the instance is currently running. This is what i was proposing with #1, sorry if it was unclear. Will make it more explicit. 1. Reject the rebuild request indicating that rebuilding with a new image with **different** required traits compared to the original request is not supported. If the new image has the same or reduced set of traits as the old image, then the request will be passed through to the conductor etc Pseudo code > if not set(new_image.traits_required).issubset( set(original_image.traits_required)) > raise exception On Wed, May 2, 2018 at 7:07 AM, Matt Riedemann wrote: > On 5/1/2018 5:26 PM, Arvind N wrote: > >> In cases of rebuilding of an instance using a different image where the >> image traits have changed between the original launch and the rebuild, is >> it reasonable to ask to just re-launch a new instance with the new image? >> >> The argument for this approach is that given that the requirements have >> changed, we want the scheduler to pick and allocate the appropriate host >> for the instance. >> > > We don't know if the requirements have changed with the new image until we > check them. > > Here is another option: > > What if the API compares the original image required traits against the > new image required traits, and if the new image has required traits which > weren't in the original image, then (punt) fail in the API? Then you would > at least have a chance to rebuild with a new image that has required traits > as long as those required traits are less than or equal to the originally > validated traits for the host on which the instance is currently running. > > >> The approach above also gives you consistent results vs the other >> approaches where the rebuild may or may not succeed depending on how the >> original allocation of resources went. >> >> > Consistently frustrating, I agree. :) Because as a user, I can rebuild > with some images (that don't have required traits) and can't rebuild with > other images (that do have required traits). > > I see no difference with this and being able to rebuild (with a new image) > some instances (image-backed) and not others (volume-backed). Given that, I > expect if we punt on this, someone will just come along asking for the > support later. Could be a couple of years from now when everyone has moved > on and it then becomes someone else's problem. > > For example(from Alex Xu) ,if you launched an instance on a host which has >> two SRIOV nic. One is normal SRIOV nic(A), another one with some kind of >> offload feature(B). >> >> So, the original request is: resources=SRIOV_VF:1 The instance gets a VF >> from the normal SRIOV nic(A). >> >> But with a new image, the new request is: resources=SRIOV_VF:1 >> traits=HW_NIC_OFFLOAD_XX >> >> With all the solutions discussed in the thread, a rebuild request like >> above may or may not succeed depending on whether during the initial launch >> whether nic A or nic B was allocated. >> >> Remember that in rebuild new allocation don't happen, we have to reuse >> the existing allocations. >> >> Given the above background, there seems to be 2 competing options. >> >> 1. Fail in the API saying you can't rebuild with a new image with new >> required traits. >> >> 2. Look at the current allocations for the instance and try to match the >> new requirement from the image with the allocations. >> >> With #1, we get consistent results in regards to how rebuilds are treated >> when the image traits changed. >> >> With #2, the rebuild may or may not succeed, depending on how well the >> original allocations match up with the new requirements. >> >> #2 will also need to need to account for handling preferred traits or >> granular resource traits if we decide to implement them for images at some >> point... >> > > Option 10: Don't support image-defined traits at all. I know that won't > happen though. > > At this point I'm exhausted with this entire issue and conversation and > will probably bow out and need someone else to step in with different > perspective, like melwitt or dansmith. > > All of the solutions are bad in their own way, either because they add > technical debt and poor user experience, or because they make rebuild more > complicated and harder to maintain for the developers. > > -- > > Thanks, > > Matt > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Arvind N -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 2 16:25:25 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 11:25:25 -0500 Subject: [Openstack-operators] [nova][ironic] ironic_host_manager and baremetal scheduler options removal Message-ID: The baremetal scheduling options were deprecated in Pike [1] and the ironic_host_manager was deprecated in Queens [2] and is now being removed [3]. Deployments must use resource classes now for baremetal scheduling. [4] The large host subset size value is also no longer needed. [5] I've gone through all of the references to "ironic_host_manager" that I could find in codesearch.o.o and updated projects accordingly [6]. Please reply ASAP to this thread and/or [3] if you have issues with this. [1] https://review.openstack.org/#/c/493052/ [2] https://review.openstack.org/#/c/521648/ [3] https://review.openstack.org/#/c/565805/ [4] https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-resource-classes [5] https://review.openstack.org/565736/ [6] https://review.openstack.org/#/q/topic:exact-filters+(status:open+OR+status:merged) -- Thanks, Matt From mgagne at calavera.ca Wed May 2 16:40:56 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Wed, 2 May 2018 12:40:56 -0400 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: References: Message-ID: What's the state of caching_scheduler which could still be using those configs? Mathieu On Wed, May 2, 2018 at 12:25 PM, Matt Riedemann wrote: > The baremetal scheduling options were deprecated in Pike [1] and the > ironic_host_manager was deprecated in Queens [2] and is now being removed > [3]. Deployments must use resource classes now for baremetal scheduling. [4] > > The large host subset size value is also no longer needed. [5] > > I've gone through all of the references to "ironic_host_manager" that I > could find in codesearch.o.o and updated projects accordingly [6]. > > Please reply ASAP to this thread and/or [3] if you have issues with this. > > [1] https://review.openstack.org/#/c/493052/ > [2] https://review.openstack.org/#/c/521648/ > [3] https://review.openstack.org/#/c/565805/ > [4] > https://docs.openstack.org/ironic/latest/install/configure-nova-flavors.html#scheduling-based-on-resource-classes > [5] https://review.openstack.org/565736/ > [6] > https://review.openstack.org/#/q/topic:exact-filters+(status:open+OR+status:merged) > From mriedemos at gmail.com Wed May 2 16:49:46 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 11:49:46 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: References: Message-ID: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> On 5/2/2018 11:40 AM, Mathieu Gagné wrote: > What's the state of caching_scheduler which could still be using those configs? The CachingScheduler has been deprecated since Pike [1]. We discussed the CachingScheduler at the Rocky PTG in Dublin [2] and have a TODO to write a nova-manage data migration tool to create allocations in Placement for instances that were scheduled using the CachingScheduler (since Pike) which don't have their own resource allocations set in Placement (remember that starting in Pike the FilterScheduler started creating allocations in Placement rather than the ResourceTracker in nova-compute). If you're running computes that are Ocata or Newton, then the ResourceTracker in the nova-compute service should be creating the allocations in Placement for you, assuming you have the compute service configured to talk to Placement (optional in Newton, required in Ocata). [1] https://review.openstack.org/#/c/492210/ [2] https://etherpad.openstack.org/p/nova-ptg-rocky-placement -- Thanks, Matt From mgagne at calavera.ca Wed May 2 17:00:46 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Wed, 2 May 2018 13:00:46 -0400 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> References: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> Message-ID: On Wed, May 2, 2018 at 12:49 PM, Matt Riedemann wrote: > On 5/2/2018 11:40 AM, Mathieu Gagné wrote: >> >> What's the state of caching_scheduler which could still be using those >> configs? > > > The CachingScheduler has been deprecated since Pike [1]. We discussed the > CachingScheduler at the Rocky PTG in Dublin [2] and have a TODO to write a > nova-manage data migration tool to create allocations in Placement for > instances that were scheduled using the CachingScheduler (since Pike) which > don't have their own resource allocations set in Placement (remember that > starting in Pike the FilterScheduler started creating allocations in > Placement rather than the ResourceTracker in nova-compute). > > If you're running computes that are Ocata or Newton, then the > ResourceTracker in the nova-compute service should be creating the > allocations in Placement for you, assuming you have the compute service > configured to talk to Placement (optional in Newton, required in Ocata). > > [1] https://review.openstack.org/#/c/492210/ > [2] https://etherpad.openstack.org/p/nova-ptg-rocky-placement If one can still run CachingScheduler (even if it's deprecated), I think we shouldn't remove the above options. As you can end up with a broken setup and IIUC no way to migrate to placement since migration script has yet to be written. -- Mathieu From mriedemos at gmail.com Wed May 2 17:39:03 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 12:39:03 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: References: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> Message-ID: <60821a79-42a4-dfa4-cc65-2fbc068f8b35@gmail.com> On 5/2/2018 12:00 PM, Mathieu Gagné wrote: > If one can still run CachingScheduler (even if it's deprecated), I > think we shouldn't remove the above options. > As you can end up with a broken setup and IIUC no way to migrate to > placement since migration script has yet to be written. You're currently on cells v1 on mitaka right? So you have some time to get this sorted out before getting to Rocky where the IronicHostManager is dropped. I know you're just one case, but I don't know how many people are really running the CachingScheduler with ironic either, so it might be rare. It would be nice to get other operator input here, like I'm guessing CERN has their cells carved up so that certain cells are only serving baremetal requests while other cells are only VMs? FWIW, I think we can also backport the data migration CLI to stable branches once we have it available so you can do your migration in let's say Queens before getting to Rocky. -- Thanks, Matt From mgagne at calavera.ca Wed May 2 17:48:06 2018 From: mgagne at calavera.ca (=?UTF-8?Q?Mathieu_Gagn=C3=A9?=) Date: Wed, 2 May 2018 13:48:06 -0400 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: <60821a79-42a4-dfa4-cc65-2fbc068f8b35@gmail.com> References: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> <60821a79-42a4-dfa4-cc65-2fbc068f8b35@gmail.com> Message-ID: On Wed, May 2, 2018 at 1:39 PM, Matt Riedemann wrote: > > I know you're just one case, but I don't know how many people are really > running the CachingScheduler with ironic either, so it might be rare. It > would be nice to get other operator input here, like I'm guessing CERN has > their cells carved up so that certain cells are only serving baremetal > requests while other cells are only VMs? I found FilterScheduler to be near impossible to use with Ironic due to the huge number of hypervisors it had to handle. Using CachingScheduler made a huge difference, like day and night. > FWIW, I think we can also backport the data migration CLI to stable branches > once we have it available so you can do your migration in let's say Queens > before getting to Rocky. -- Mathieu From mriedemos at gmail.com Wed May 2 22:45:37 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 17:45:37 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild In-Reply-To: References: <221636a9-4b8f-1098-10b8-2240a7cb0ff7@gmail.com> <8eec45ab-f9ed-cd96-51a1-9be78849fb9b@gmail.com> <530903a4-701d-595e-acc3-05369697cf06@gmail.com> <30e8e58b-a2f0-df83-49ba-d4d7a9aeddf3@gmail.com> Message-ID: On 5/2/2018 5:39 PM, Jay Pipes wrote: > My personal preference is to add less technical debt and go with a > solution that checks if image traits have changed in nova-api and if so, > simply refuse to perform a rebuild. So, what if when I created my server, the image I used, let's say image1, had required trait A and that fit the host. Then some external service removes (or somehow changes) trait A from the compute node resource provider (because people can and will do this, there are a few vmware specs up that rely on being able to manage traits out of band from nova), and then I rebuild my server with image2 that has required trait A. That would match the original trait A in image1 and we'd say, "yup, lgtm!" and do the rebuild even though the compute node resource provider wouldn't have trait A anymore. Having said that, it could technically happen before traits if the operator changed something on the underlying compute host which invalidated instances running on that host, but I'd think if that happened the operator would be migrating everything off the host and disabling it from scheduling before making whatever that kind of change would be, let's say they change the hypervisor or something less drastic but still image property invalidating. -- Thanks, Matt From arvindn05 at gmail.com Wed May 2 23:06:03 2018 From: arvindn05 at gmail.com (Arvind N) Date: Wed, 2 May 2018 16:06:03 -0700 Subject: [Openstack-operators] [openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild In-Reply-To: References: <221636a9-4b8f-1098-10b8-2240a7cb0ff7@gmail.com> <8eec45ab-f9ed-cd96-51a1-9be78849fb9b@gmail.com> <530903a4-701d-595e-acc3-05369697cf06@gmail.com> <30e8e58b-a2f0-df83-49ba-d4d7a9aeddf3@gmail.com> Message-ID: Isnt this an existing issue with traits specified in flavor as well? Server is created using flavor1 requiring trait A on RP1. Before the rebuild is called, the underlying RP1 can be updated to remove trait A and when a rebuild is requested(regardless of whether the image is updated or not), we skip scheduling and allow the rebuild to go through. Now, even though the flavor1 requests trait A, the underlying RP1 does not have that trait the rebuild will succeed... I think maybe there should be some kind of report or query which runs periodically to ensure continued conformance with respect to instance running and their traits. But since traits are intend to provide hints for scheduling, this is different problem to solve IMO. On Wed, May 2, 2018 at 3:45 PM, Matt Riedemann wrote: > On 5/2/2018 5:39 PM, Jay Pipes wrote: > >> My personal preference is to add less technical debt and go with a >> solution that checks if image traits have changed in nova-api and if so, >> simply refuse to perform a rebuild. >> > > So, what if when I created my server, the image I used, let's say image1, > had required trait A and that fit the host. > > Then some external service removes (or somehow changes) trait A from the > compute node resource provider (because people can and will do this, there > are a few vmware specs up that rely on being able to manage traits out of > band from nova), and then I rebuild my server with image2 that has required > trait A. That would match the original trait A in image1 and we'd say, > "yup, lgtm!" and do the rebuild even though the compute node resource > provider wouldn't have trait A anymore. > > Having said that, it could technically happen before traits if the > operator changed something on the underlying compute host which invalidated > instances running on that host, but I'd think if that happened the operator > would be migrating everything off the host and disabling it from scheduling > before making whatever that kind of change would be, let's say they change > the hypervisor or something less drastic but still image property > invalidating. > > -- > > Thanks, > > Matt > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > -- Arvind N -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 3 00:47:01 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 2 May 2018 19:47:01 -0500 Subject: [Openstack-operators] [openstack-dev] [nova][ironic] ironic_host_manager and baremetal scheduler options removal In-Reply-To: <60821a79-42a4-dfa4-cc65-2fbc068f8b35@gmail.com> References: <96f7142b-8838-93f8-d8a7-46ff7010c394@gmail.com> <60821a79-42a4-dfa4-cc65-2fbc068f8b35@gmail.com> Message-ID: <356c7795-b31e-4de6-47c6-61949f8a3e95@gmail.com> On 5/2/2018 12:39 PM, Matt Riedemann wrote: > FWIW, I think we can also backport the data migration CLI to stable > branches once we have it available so you can do your migration in let's > say Queens before g FYI, here is the start on the data migration CLI: https://review.openstack.org/#/c/565886/ -- Thanks, Matt From mrhillsman at gmail.com Fri May 4 16:58:17 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Fri, 04 May 2018 16:58:17 +0000 Subject: [Openstack-operators] Reminder: UC Meeting Monday 1800UTC Message-ID: Hey everyone, Please see https://wiki.openstack.org/wiki/Governance/Foundation/UserCommittee for UC meeting info and add additional agenda items if needed. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon May 7 07:11:59 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 7 May 2018 09:11:59 +0200 Subject: [Openstack-operators] octavia on ocata Message-ID: Hello everyone, I'd like to know if anynone has tried to installa octavia lbaas on ocata centos 7 release . If yes, does it work ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From shake.chen at gmail.com Mon May 7 07:17:05 2018 From: shake.chen at gmail.com (Shake Chen) Date: Mon, 7 May 2018 15:17:05 +0800 Subject: [Openstack-operators] octavia on ocata In-Reply-To: References: Message-ID: in kolla, ocata, Ocatavia is work. On Mon, May 7, 2018 at 3:11 PM, Ignazio Cassano wrote: > Hello everyone, > I'd like to know if anynone has tried to installa octavia lbaas on ocata > centos 7 release . > If yes, does it work ? > > Regards > Ignazio > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -- Shake Chen -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon May 7 07:18:26 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 7 May 2018 09:18:26 +0200 Subject: [Openstack-operators] octavia on ocata In-Reply-To: References: Message-ID: Many thanks 2018-05-07 9:17 GMT+02:00 Shake Chen : > in kolla, ocata, Ocatavia is work. > > On Mon, May 7, 2018 at 3:11 PM, Ignazio Cassano > wrote: > >> Hello everyone, >> I'd like to know if anynone has tried to installa octavia lbaas on ocata >> centos 7 release . >> If yes, does it work ? >> >> Regards >> Ignazio >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> > > > -- > Shake Chen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Mon May 7 10:27:48 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Mon, 7 May 2018 18:27:48 +0800 Subject: [Openstack-operators] [openstack-dev][heat][all] Heat now migrated to StoryBoard!! In-Reply-To: References: Message-ID: Hi all, I updated more information to this guideline in [1]. Please must take a view on [1] to see what's been updated. will likely to keep update on that etherpad if new Q&A or issue found. Will keep trying to make this process as painless for you as possible, so please endure with us for now, and sorry for any inconvenience *[1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info * 2018-05-05 12:15 GMT+08:00 Rico Lin : > looping heat-dashboard team > > 2018-05-05 12:02 GMT+08:00 Rico Lin : > >> Dear all Heat members and friends >> >> As you might award, OpenStack projects are scheduled to migrating ([5]) >> from Launchpad to StoryBoard [1]. >> For whom who like to know where to file a bug/blueprint, here are some >> heads up for you. >> >> *What's StoryBoard?* >> StoryBoard is a cross-project task-tracker, contains numbers of >> ``project``, each project contains numbers of ``story`` which you can think >> it as an issue or blueprint. Within each story, contains one or multiple >> ``task`` (task separate stories into the tasks to resolve/implement). To >> learn more about StoryBoard or how to make a good story, you can reference >> [6]. >> >> *How to file a bug?* >> This is actually simple, use your current ubuntu-one id to access to >> storyboard. Then find the corresponding project in [2] and create a story >> to it with a description of your issue. We should try to create tasks which >> to reference with patches in Gerrit. >> >> *How to work on a spec (blueprint)?* >> File a story like you used to file a Blueprint. Create tasks for your >> plan. Also you might want to create a task for adding spec( in heat-spec >> repo) if your blueprint needs documents to explain. >> I still leave current blueprint page open, so if you like to create a >> story from BP, you can still get information. Right now we will start work >> as task-driven workflow, so BPs should act no big difference with a bug in >> StoryBoard (which is a story with many tasks). >> >> *Where should I put my story?* >> We migrate all heat sub-projects to StoryBoard to try to keep the impact >> to whatever you're doing as small as possible. However, if you plan to >> create a new story, *please create it under heat project [4]* and tag it >> with what it might affect with (like python-heatclint, heat-dashboard, >> heat-agents). We do hope to let users focus their stories in one place so >> all stories will get better attention and project maintainers don't need to >> go around separate places to find it. >> >> *How to connect from Gerrit to StoryBoard?* >> We usually use following key to reference Launchpad >> Closes-Bug: ####### >> Partial-Bug: ####### >> Related-Bug: ####### >> >> Now in StoryBoard, you can use following key. >> Task: ###### >> Story: ###### >> you can find more info in [3]. >> >> *What I need to do for my exists bug/bps?* >> Your bug is automatically migrated to StoryBoard, however, the reference >> in your patches ware not, so you need to change your commit message to >> replace the old link to launchpad to new links to StoryBoard. >> >> *Do we still need Launchpad after all this migration are done?* >> As the plan, we won't need Launchpad for heat anymore once we have done >> with migrating. Will forbid new bugs/bps filed in Launchpad. Also, try to >> provide new information as many as possible. Hopefully, we can make >> everyone happy. For those newly created bugs during/after migration, don't >> worry we will disallow further create new bugs/bps and do a second migrate >> so we won't missed yours. >> >> [1] https://storyboard.openstack.org/ >> [2] https://storyboard.openstack.org/#!/project_group/82 >> [3] https://docs.openstack.org/infra/manual/developers.html# >> development-workflow >> [4] https://storyboard.openstack.org/#!/project/989 >> [5] https://docs.openstack.org/infra/storyboard/migration.html >> [6] https://docs.openstack.org/infra/storyboard/gui/tasks_ >> stories_tags.html#what-is-a-story >> >> >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From amy at demarco.com Mon May 7 20:45:25 2018 From: amy at demarco.com (Amy Marrich) Date: Mon, 7 May 2018 15:45:25 -0500 Subject: [Openstack-operators] OpenStack User Survey Message-ID: Hi everyone, If you’re running OpenStack, please participate in the User Survey to share more about your technology implementations and provide feedback for the community. Please help us spread the word. We're trying to gather as much real-world deployment data as possible to share back with both the operator and developer communities. We have made it easier to complete, and the survey is* now available in 7 languages*—English, German, Indonesian, Japanese, Korean, traditional Chinese and simplified Chinese. Based on feedback from the operator community, we are only conducting one survey this year, collecting submissions until early August. The report will then be published in October prior to the Berlin Summit If you would like OpenStack user data in the meantime, check out the analytics dashboard updates in real time, throughout the year. The information provided is confidential and will only be presented in aggregate unless you consent to make it public. The deadline to complete the survey and be part of the next report is *Friday, August 3 at 23:59 UTC.* - You can login and complete the OpenStack User Survey here: http://www.openstack.org/user-survey - If you’re interested in joining the OpenStack User Survey Working Group to help with the survey analysis, please complete this form: https://openstackfoundation.formstack.com/forms/user_survey_working_group - Help us promote the User Survey: https://twitter.com/OpenStack/status/ 993589356312088577 Please let me know if you have any questions. Cheers, Amy Amy Marrich (spotz) OpenStack User Committee -------------- next part -------------- An HTML attachment was scrubbed... URL: From christian.zunker at codecentric.cloud Tue May 8 06:36:38 2018 From: christian.zunker at codecentric.cloud (Christian Zunker) Date: Tue, 08 May 2018 06:36:38 +0000 Subject: [Openstack-operators] How are you handling billing/chargeback? In-Reply-To: References: <20180312192113.znz4eavfze5zg7yn@redhat.com> Message-ID: Hi, we are running a cloud based on openstack-ansible and now are trying to integrate cloudkitty for billing. Till now we used a self written python script to query ceilometer for needed data, but that got more tedious than we are willing to handle. We hope it gets much easier once cloudkitty is set up. regards Christian > From: Lars Kellogg-Stedman > Date: Mo., 12. März 2018 um 20:27 Uhr > Subject: [Openstack-operators] How are you handling billing/chargeback? > To: openstack-operators at lists.openstack.org < > openstack-operators at lists.openstack.org> > > > Hey folks, > > I'm curious what folks out there are using for chargeback/billing in > your OpenStack environment. > > Are you doing any sort of chargeback (or showback)? Are you using (or > have you tried) CloudKitty? Or some other existing project? Have you > rolled your own instead? > > I ask because I am helping out some folks get a handle on the > operational side of their existing OpenStack environment, and they are > interested in but have not yet deployed some sort of reporting > mechanism. > > Thanks, > > > -- > Lars Kellogg-Stedman | larsks @ {irc,twitter,github} > http://blog.oddbit.com/ | > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- cc cloud GmbH | Hochstr. 11 | 42697 Solingen | Deutschland mobil: +49 175 1068513 www.codecentric.cloud | blog.codecentric.de | www.meettheexperts.de Sitz der Gesellschaft: Solingen | HRB 28640| Amtsgericht Wuppertal Geschäftsführung: Werner Krandick . Rainer Vehns Diese E-Mail einschließlich evtl. beigefügter Dateien enthält vertrauliche und/oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und löschen Sie diese E-Mail und evtl. beigefügter Dateien umgehend. Das unerlaubte Kopieren, Nutzen oder Öffnen evtl. beigefügter Dateien sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue May 8 15:04:21 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 8 May 2018 11:04:21 -0400 Subject: [Openstack-operators] ops meetups team meeting minutes 2018-5-8 Message-ID: Today's Ops Meetups Team meeting was chaired by Shintaro Mizuno. Minutes here: Minutes: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-08-14.17.html 10:58 AM Minutes (text): http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-08-14.17.txt 10:58 AM Log: http://eavesdrop.openstack.org/meetings/ops_meetup_team/2018/ops_meetup_team.2018-05-08-14.17.log.html Please watch out for further updates about the upcoming Vancouver ops sessions, and also please note that early-bird tickets for the PTG in september will now remain available until May 18th. Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From martialmichel at datamachines.io Tue May 8 21:59:24 2018 From: martialmichel at datamachines.io (Martial Michel) Date: Tue, 08 May 2018 21:59:24 +0000 Subject: [Openstack-operators] [Scientific] Scientific SIG - IRC meeting Wed 9 at 1100UTC Message-ID: Hello, We will have our IRC meeting in the #openstack-meeting channel at 1100 UTC May 9th. Final agenda will be at: *https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_8st_2018 * 1. SIG Cycle Report 1. https://etherpad.openstack.org/p/scientific-sig-report-queens 2. Call for Lighting Talks 1. https://etherpad.openstack.org/p/scientific-sig-vancouver2018-lighting-talks 3. AOB All are welcome. Looking forward to seeing you there -- Martial -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed May 9 09:09:10 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 9 May 2018 11:09:10 +0200 Subject: [Openstack-operators] ocata /usr/bin/octavia-diskimage-create.sh -i centos fails Message-ID: Hi all, I am trying to create an octavia amphra image on ocata usigng the package openstack-octavia-diskimage-create on centos 7 but it fails: diskimage-builder fails to create disk image- cannot uninstall virtualenv I read this is a bug and a workaround cloud be setting DIB_INSTALLTYPE _pip_and_virtualenv to "package" In this case the command reported:Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-obMDl9-build/ Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From cdent+os at anticdent.org Wed May 9 12:42:02 2018 From: cdent+os at anticdent.org (Chris Dent) Date: Wed, 9 May 2018 13:42:02 +0100 (BST) Subject: [Openstack-operators] [nova] [placement] placement extraction session at forum Message-ID: I've started an etherpad related to the Vancouver Forum session on extracting placement from nova. It's mostly just an outline for now but is evolving: https://etherpad.openstack.org/p/YVR-placement-extraction If we can get some real information in there before the session we are much more likely to have a productive session. Please feel free to add any notes or questions you have there. Or on this thread if you prefer. The (potentially overly-optimistic) hope is that we can complete any prepatory work before the end of Rocky and then do the extraction in Stein. If we are willing to accept (please, let's) some form of control plane downtime data migration issues can be vastly eased. Getting agreement on how that might work is one of the goals of the session. Your input very appreciated. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From cdent+os at anticdent.org Wed May 9 12:56:58 2018 From: cdent+os at anticdent.org (Chris Dent) Date: Wed, 9 May 2018 13:56:58 +0100 (BST) Subject: [Openstack-operators] [cinder] [placement] cinder + placement forum session etherpad Message-ID: I've started an etherpad for the forum session in Vancouver devoted to discussing the possibility of tracking and allocation resources in Cinder using the Placement service. This is not a done deal. Instead the session is to discuss if it could work and how to make it happen if it seems like a good idea. The etherpad is at https://etherpad.openstack.org/p/YVR-cinder-placement but there's not a great deal there yet. Notably there's no description of how scheduling and resource tracking currently works in Cinder because I have no experience with that. This session is mostly for exploring and sharing information so the value of the etherpad may mostly be in the notes we take at the session, but anything we write in advance will help keep things a bit more structured and focused. If this is a topic of interest for you please add some notes to the etherpad, or if you prefer, here. Thanks. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent From jp.methot at planethoster.info Thu May 10 01:11:06 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Thu, 10 May 2018 10:11:06 +0900 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud Message-ID: Hi, I currently operate a multi-region cloud split between 2 geographic locations. I have updated it to Pike not too long ago, but I've been running into a peculiar issue. Ever since the Pike release, Nova now asks Keystone if a new project exists in Keystone before configuring the project’s quotas. However, there doesn’t seem to be any region restriction regarding which endpoint Nova will query Keystone on. So, right now, if I create a new project in region one, Nova will query Keystone in region two. Because my keystone databases are not synched in real time between each region, the region two Keystone will tell it that the new project doesn't exist, while it exists in region one Keystone. Thinking that this could be a configuration error, I tried setting the region_name in keystone_authtoken, but that didn’t change much of anything. Right now I am thinking this may be a bug. Could someone confirm that this is indeed a bug and not a configuration error? To circumvent this issue, I am considering either modifying the database by hand or trying to implement realtime replication between both Keystone databases. Would there be another solution? (beside modifying the code for the Nova check) Jean-Philippe Méthot Openstack system administrator Administrateur système Openstack PlanetHoster inc. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sagaray at nttdata.co.jp Thu May 10 02:33:16 2018 From: sagaray at nttdata.co.jp (sagaray at nttdata.co.jp) Date: Thu, 10 May 2018 02:33:16 +0000 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function Message-ID: <1525919628734.2105@nttdata.co.jp> Hi Takashi, and guys, We are operating large telco enterprise cloud. We always do the maintenance work on midnight during limited time-slot to minimize impact to our users. Operation planning of cold migration is difficult because cold migration time will vary drastically as it also depends on the load on storage servers at that point of time. If cold migration task stalls for any unknown reasons, operators may decide to cancel it manually. This requires several manual steps to be carried out for recovering from such situation such as kill the copy process, reset-state, stop, and start the VM. If we have the ability to cancel cold migration, we can resume our service safely even though the migration is not complete in the stipulated maintenance time window. As of today, we can solve the above issue by following manual procedure to recover instances from cold migration failure but we still need to follow these steps every time. We can build our own tool to automate this process but we will need to maintain it by ourselves as this feature is not supported by any OpenStack distribution. If Nova supports function to cancel cold migration, it’s definitely going to help us to bring instances back from cold migration failure thus improving service availability to our end users. Secondly, we don’t need to worry about maintaining procedure manual or proprietary tool by ourselves which will be a huge win for us. We are definitely interested in this function and we would love to see it in the next coming release. Thank you for your hard work. -------------------------------------------------- Yukinori Sagara Platform Engineering Department, NTT DATA Corp. > Hi everyone, > > I'm going to add the aborting cold migration function [1] in nova. > I would like to ask operators' feedback on this. > > The cold migration is an administrator operation by default. > If administrators perform cold migration and it is stalled out, > users cannot do their operations (e.g. starting the VM). > > In that case, if administrators can abort the cold migration by using > this function, > it enables users to operate their VMs. > > If you are a person like the following, would you reply to this mail? > > * Those who need this function > * Those who will use this function if it is implemented > * Those who think that it is better to have this function > * Those who are interested in this function > > [1] https://review.openstack.org/#/c/334732/ > > Regards, > Takashi Natsume > NTT Software Innovation Center > E-mail: natsume.takashi at lab.ntt.co.jp From ignaziocassano at gmail.com Thu May 10 09:58:02 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 10 May 2018 11:58:02 +0200 Subject: [Openstack-operators] octavia worker on ocata Message-ID: Hello everyone, I've just installed octavia on ocata . All octavia services are running except worker. It reports the following error in worker.log: 2018-05-10 11:33:27.404 121193 ERROR oslo_service.service InvalidTarget: A server's target must have topic and server names specified: 2018-05-10 11:33:27.404 121193 ERROR oslo_service.service Could anyone help me ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Thu May 10 10:05:57 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 10 May 2018 12:05:57 +0200 Subject: [Openstack-operators] octavia worker on ocata In-Reply-To: References: Message-ID: I am sorry, I forgot to setup topic attribute in oslo_messaging section. Regards Ignazio 2018-05-10 11:58 GMT+02:00 Ignazio Cassano : > Hello everyone, > I've just installed octavia on ocata . > All octavia services are running except worker. > It reports the following error in worker.log: > > 2018-05-10 11:33:27.404 121193 ERROR oslo_service.service InvalidTarget: A > server's target must have topic and server names specified: server=podto2-octavia> > 2018-05-10 11:33:27.404 121193 ERROR oslo_service.service > > Could anyone help me ? > Regards > Ignazio > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Thu May 10 10:42:05 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Thu, 10 May 2018 18:42:05 +0800 Subject: [Openstack-operators] [openstack-dev][heat][all] Heat now migrated to StoryBoard!! In-Reply-To: References: Message-ID: Hi all, As we keep adding more info to the migration guideline [1], you might like to take a look again. And do hope it will make things easier for you. If not, please find me in irc or mail. [1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info Here's the quick hint for you, your bug id is exactly your story id. 2018-05-07 18:27 GMT+08:00 Rico Lin : > Hi all, > > I updated more information to this guideline in [1]. > Please must take a view on [1] to see what's been updated. > will likely to keep update on that etherpad if new Q&A or issue found. > > Will keep trying to make this process as painless for you as possible, > so please endure with us for now, and sorry for any inconvenience > > *[1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info > * > > 2018-05-05 12:15 GMT+08:00 Rico Lin : > >> looping heat-dashboard team >> >> 2018-05-05 12:02 GMT+08:00 Rico Lin : >> >>> Dear all Heat members and friends >>> >>> As you might award, OpenStack projects are scheduled to migrating ([5]) >>> from Launchpad to StoryBoard [1]. >>> For whom who like to know where to file a bug/blueprint, here are some >>> heads up for you. >>> >>> *What's StoryBoard?* >>> StoryBoard is a cross-project task-tracker, contains numbers of >>> ``project``, each project contains numbers of ``story`` which you can think >>> it as an issue or blueprint. Within each story, contains one or multiple >>> ``task`` (task separate stories into the tasks to resolve/implement). To >>> learn more about StoryBoard or how to make a good story, you can reference >>> [6]. >>> >>> *How to file a bug?* >>> This is actually simple, use your current ubuntu-one id to access to >>> storyboard. Then find the corresponding project in [2] and create a story >>> to it with a description of your issue. We should try to create tasks which >>> to reference with patches in Gerrit. >>> >>> *How to work on a spec (blueprint)?* >>> File a story like you used to file a Blueprint. Create tasks for your >>> plan. Also you might want to create a task for adding spec( in heat-spec >>> repo) if your blueprint needs documents to explain. >>> I still leave current blueprint page open, so if you like to create a >>> story from BP, you can still get information. Right now we will start work >>> as task-driven workflow, so BPs should act no big difference with a bug in >>> StoryBoard (which is a story with many tasks). >>> >>> *Where should I put my story?* >>> We migrate all heat sub-projects to StoryBoard to try to keep the impact >>> to whatever you're doing as small as possible. However, if you plan to >>> create a new story, *please create it under heat project [4]* and tag >>> it with what it might affect with (like python-heatclint, heat-dashboard, >>> heat-agents). We do hope to let users focus their stories in one place so >>> all stories will get better attention and project maintainers don't need to >>> go around separate places to find it. >>> >>> *How to connect from Gerrit to StoryBoard?* >>> We usually use following key to reference Launchpad >>> Closes-Bug: ####### >>> Partial-Bug: ####### >>> Related-Bug: ####### >>> >>> Now in StoryBoard, you can use following key. >>> Task: ###### >>> Story: ###### >>> you can find more info in [3]. >>> >>> *What I need to do for my exists bug/bps?* >>> Your bug is automatically migrated to StoryBoard, however, the reference >>> in your patches ware not, so you need to change your commit message to >>> replace the old link to launchpad to new links to StoryBoard. >>> >>> *Do we still need Launchpad after all this migration are done?* >>> As the plan, we won't need Launchpad for heat anymore once we have done >>> with migrating. Will forbid new bugs/bps filed in Launchpad. Also, try to >>> provide new information as many as possible. Hopefully, we can make >>> everyone happy. For those newly created bugs during/after migration, don't >>> worry we will disallow further create new bugs/bps and do a second migrate >>> so we won't missed yours. >>> >>> [1] https://storyboard.openstack.org/ >>> [2] https://storyboard.openstack.org/#!/project_group/82 >>> [3] https://docs.openstack.org/infra/manual/developers.html# >>> development-workflow >>> [4] https://storyboard.openstack.org/#!/project/989 >>> [5] https://docs.openstack.org/infra/storyboard/migration.html >>> [6] https://docs.openstack.org/infra/storyboard/gui/tasks_st >>> ories_tags.html#what-is-a-story >>> >>> >>> >>> -- >>> May The Force of OpenStack Be With You, >>> >>> *Rico Lin*irc: ricolin >>> >>> >> >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From natsume.takashi at lab.ntt.co.jp Thu May 10 10:42:14 2018 From: natsume.takashi at lab.ntt.co.jp (Takashi Natsume) Date: Thu, 10 May 2018 19:42:14 +0900 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <1525919628734.2105@nttdata.co.jp> References: <1525919628734.2105@nttdata.co.jp> Message-ID: <5fa86256-c601-91cf-570e-04b63a688b47@lab.ntt.co.jp> Flint and Yukinori, Thank you for your replies! On 2018/05/10 11:33, sagaray at nttdata.co.jp wrote: > Hi Takashi, and guys, > > We are operating large telco enterprise cloud. > > We always do the maintenance work on midnight during limited time-slot to minimize impact to our users. > > Operation planning of cold migration is difficult because cold migration time will vary drastically as it also depends on the load on storage servers at that point of time. If cold migration task stalls for any unknown reasons, operators may decide to cancel it manually. This requires several manual steps to be carried out for recovering from such situation such as kill the copy process, reset-state, stop, and start the VM. If we have the ability to cancel cold migration, we can resume our service safely even though the migration is not complete in the stipulated maintenance time window. > > As of today, we can solve the above issue by following manual procedure to recover instances from cold migration failure but we still need to follow these steps every time. We can build our own tool to automate this process but we will need to maintain it by ourselves as this feature is not supported by any OpenStack distribution. > > If Nova supports function to cancel cold migration, it’s definitely going to help us to bring instances back from cold migration failure thus improving service availability to our end users. Secondly, we don’t need to worry about maintaining procedure manual or proprietary tool by ourselves which will be a huge win for us. > > We are definitely interested in this function and we would love to see it in the next coming release. > > Thank you for your hard work. > > -------------------------------------------------- > Yukinori Sagara > Platform Engineering Department, NTT DATA Corp. > >> Hi everyone, >> >> I'm going to add the aborting cold migration function [1] in nova. >> I would like to ask operators' feedback on this. >> >> The cold migration is an administrator operation by default. >> If administrators perform cold migration and it is stalled out, >> users cannot do their operations (e.g. starting the VM). >> >> In that case, if administrators can abort the cold migration by using >> this function, >> it enables users to operate their VMs. >> >> If you are a person like the following, would you reply to this mail? >> >> * Those who need this function >> * Those who will use this function if it is implemented >> * Those who think that it is better to have this function >> * Those who are interested in this function >> >> [1] https://review.openstack.org/#/c/334732/ >> >> Regards, >> Takashi Natsume >> NTT Software Innovation Center >> E-mail: natsume.takashi at lab.ntt.co.jp Regards, Takashi Natsume NTT Software Innovation Center E-mail: natsume.takashi at lab.ntt.co.jp From mriedemos at gmail.com Thu May 10 13:52:00 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 10 May 2018 08:52:00 -0500 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud In-Reply-To: References: Message-ID: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> On 5/9/2018 8:11 PM, Jean-Philippe Méthot wrote: > I currently operate a multi-region cloud split between 2 geographic > locations. I have updated it to Pike not too long ago, but I've been > running into a peculiar issue. Ever since the Pike release, Nova now > asks Keystone if a new project exists in Keystone before configuring the > project’s quotas. However, there doesn’t seem to be any region > restriction regarding which endpoint Nova will query Keystone on. So, > right now, if I create a new project in region one, Nova will query > Keystone in region two. Because my keystone databases are not synched in > real time between each region, the region two Keystone will tell it that > the new project doesn't exist, while it exists in region one Keystone. > > Thinking that this could be a configuration error, I tried setting the > region_name in keystone_authtoken, but that didn’t change much of > anything. Right now I am thinking this may be a bug. Could someone > confirm that this is indeed a bug and not a configuration error? > > To circumvent this issue, I am considering either modifying the database > by hand or trying to implement realtime replication between both > Keystone databases. Would there be another solution? (beside modifying > the code for the Nova check) This is the specific code you're talking about: https://github.com/openstack/nova/blob/stable/pike/nova/api/openstack/identity.py#L35 I don't see region_name as a config option for talking to keystone in Pike: https://docs.openstack.org/nova/pike/configuration/config.html#keystone But it is in Queens: https://docs.openstack.org/nova/queens/configuration/config.html#keystone That was added in this change: https://review.openstack.org/#/c/507693/ But I think what you're saying is, since you have multiple regions, the project could be in any of them at any given time until they synchronize so configuring nova for a specific region isn't probably going to help in this case, right? Isn't this somehow resolved with keystone federation? Granted, I'm not at all a keystone person, but I'd think this isn't a unique problem. -- Thanks, Matt From mriedemos at gmail.com Thu May 10 13:54:42 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 10 May 2018 08:54:42 -0500 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <1525919628734.2105@nttdata.co.jp> References: <1525919628734.2105@nttdata.co.jp> Message-ID: <0470c0af-6777-2771-35e1-69ee029b485d@gmail.com> On 5/9/2018 9:33 PM, sagaray at nttdata.co.jp wrote: > Operation planning of cold migration is difficult because cold migration time will vary drastically as it also depends on the load on storage servers at that point of time. If cold migration task stalls for any unknown reasons, operators may decide to cancel it manually. What storage backend are you using? What are some reasons that it has stalled in the past? -- Thanks, Matt From thierry at openstack.org Thu May 10 13:56:49 2018 From: thierry at openstack.org (Thierry Carrez) Date: Thu, 10 May 2018 15:56:49 +0200 Subject: [Openstack-operators] [forum] Etherpad for "Ops/Devs: One community" session Message-ID: <0e25c5a4-ef13-f877-0114-ec2468079b03@openstack.org> Hi! I have created an etherpad for the "Ops/Devs: One community" Forum session that will happen in Vancouver on Monday at 4:20pm. https://etherpad.openstack.org/p/YVR-ops-devs-one-community If you are interested in continuing breaking up the community silos and making everyone "contributors" with various backgrounds but a single objective, please add to it and join the session ! -- Thierry Carrez (ttx) From mriedemos at gmail.com Thu May 10 13:59:29 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 10 May 2018 08:59:29 -0500 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <1525919628734.2105@nttdata.co.jp> References: <1525919628734.2105@nttdata.co.jp> Message-ID: <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> On 5/9/2018 9:33 PM, sagaray at nttdata.co.jp wrote: > We always do the maintenance work on midnight during limited time-slot to minimize impact to our users. Also, why are you doing maintenance with cold migration? Why not do live migration for your maintenance (which already supports the abort function). -- Thanks, Matt From lbragstad at gmail.com Thu May 10 14:26:47 2018 From: lbragstad at gmail.com (Lance Bragstad) Date: Thu, 10 May 2018 09:26:47 -0500 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud In-Reply-To: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> References: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> Message-ID: On 05/10/2018 08:52 AM, Matt Riedemann wrote: > On 5/9/2018 8:11 PM, Jean-Philippe Méthot wrote: >> I currently operate a multi-region cloud split between 2 geographic >> locations. I have updated it to Pike not too long ago, but I've been >> running into a peculiar issue. Ever since the Pike release, Nova now >> asks Keystone if a new project exists in Keystone before configuring >> the project’s quotas. However, there doesn’t seem to be any region >> restriction regarding which endpoint Nova will query Keystone on. So, >> right now, if I create a new project in region one, Nova will query >> Keystone in region two. Because my keystone databases are not synched >> in real time between each region, the region two Keystone will tell >> it that the new project doesn't exist, while it exists in region one >> Keystone. Are both keystone nodes completely separate? Do they share any information? >> >> Thinking that this could be a configuration error, I tried setting >> the region_name in keystone_authtoken, but that didn’t change much of >> anything. Right now I am thinking this may be a bug. Could someone >> confirm that this is indeed a bug and not a configuration error? >> >> To circumvent this issue, I am considering either modifying the >> database by hand or trying to implement realtime replication between >> both Keystone databases. Would there be another solution? (beside >> modifying the code for the Nova check) A variant of this just came up as a proposal for the Forum in a couple weeks [0]. A separate proposal was also discussed during this week's keystone meeting [1], which brought up an interesting solution. We should be seeing a specification soon that covers the proposal in greater detail and includes use cases. Either way, both sound like they may be relevant to you. [0] https://etherpad.openstack.org/p/YVR-edge-keystone-brainstorming [1] http://eavesdrop.openstack.org/meetings/keystone/2018/keystone.2018-05-08-16.00.log.html#l-156 > > This is the specific code you're talking about: > > https://github.com/openstack/nova/blob/stable/pike/nova/api/openstack/identity.py#L35 > > > I don't see region_name as a config option for talking to keystone in > Pike: > > https://docs.openstack.org/nova/pike/configuration/config.html#keystone > > But it is in Queens: > > https://docs.openstack.org/nova/queens/configuration/config.html#keystone > > That was added in this change: > > https://review.openstack.org/#/c/507693/ > > But I think what you're saying is, since you have multiple regions, > the project could be in any of them at any given time until they > synchronize so configuring nova for a specific region isn't probably > going to help in this case, right? > > Isn't this somehow resolved with keystone federation? Granted, I'm not > at all a keystone person, but I'd think this isn't a unique problem. Without knowing a whole lot about the current setup, I'm inclined to say it is. Keystone-to-keystone federation was developed for cases like this, and it's been something we've been trying to encourage in favor of building replication tooling outside of the database or over an API. The main concerns with taking a manual replication approach is that it could negatively impact overall performance and that keystone already assumes it will be in control of ID generation for most cases (replicating a project in RegionOne into RegionTwo will yield a different project ID, even though it is possible for both to have the same name). Additionally, there are some things that keystone doesn't expose over the API that would need to be replicated, like revocation events (I mentioned this in the etherpad linked above). -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From ignaziocassano at gmail.com Thu May 10 17:45:45 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 10 May 2018 19:45:45 +0200 Subject: [Openstack-operators] Octavia on ocata centos 7 Message-ID: Hi everyone, I am moving from lbaas v2 based on haproxy driver to octavia on centos 7 ocata. I installed a new host with octavia following the documentation. I removed all old load balancers, stopped lbaas agent and configured neutron following this link: https://docs.openstack.org/octavia/queens/contributor/guides/dev-quick-start.html On the octavia server all services are active, amphora images are installed, but when I try to create a load balancer: nuutron lbaas-loadbalancer-create --name lb1 private-subnet it tries to connect to 127.0.0.1:5000 Either on octavia.conf or neutron.conf the section for keystone is correctly configured to reach controller address. The old lbaas v2 based on haproxy driver worked fine before changing configuration but is was not possible protect lbaas adresses with security groups (this is a very old problem) because security groups are applyed only to vm ports. Since Octavia load balancer is based on vm deirved from amphora image, I'd like to use it to improve my security. Any suggestion for my octavia configuration or alternatives to improve security on lbaas ? Thanks and Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From iain.macdonnell at oracle.com Thu May 10 19:03:57 2018 From: iain.macdonnell at oracle.com (iain MacDonnell) Date: Thu, 10 May 2018 12:03:57 -0700 Subject: [Openstack-operators] Octavia on ocata centos 7 In-Reply-To: References: Message-ID: On 05/10/2018 10:45 AM, Ignazio Cassano wrote: > I am moving from lbaas v2 based on haproxy driver to octavia on centos 7 > ocata. [snip] > On the octavia server all services are active, amphora images are > installed, but when I try to create a load balancer: > > nuutron lbaas-loadbalancer-create --name lb1 private-subnet > > it tries to connect to 127.0.0.1:5000 Google found: https://bugzilla.redhat.com/show_bug.cgi?id=1434904 => https://bugzilla.redhat.com/show_bug.cgi?id=1433728 Seems that you may be missing the service_auth section from neutron_lbaas.conf or/and octavia.conf ? I've been through the frustration of trying to get Octavia working. The docs are bit iffy, and it's ... "still maturing" (from my observation). I think I did have it working with neutron_lbaasv2 at one point. My neutron_lbaas.conf included: [service_auth] auth_url = http://mykeystonehost:35357/v3 admin_user = neutron admin_tenant_name = service admin_password = n0ttell1nU admin_user_domain = default admin_project_domain = default region = myregion and octavia.conf: [service_auth] memcached_servers = mymemcachedhost:11211 auth_url = http://mykeystonehost:35357 auth_type = password project_domain_name = default project_name = service user_domain_name = default username = octavia password = n0ttell1nU Not sure how correct those are, but IIRC it did basically work. I've since moved to pure Octavia on Queens, where there is no neutron_lbaas. GL! ~iain From ignaziocassano at gmail.com Thu May 10 19:08:53 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Thu, 10 May 2018 19:08:53 +0000 Subject: [Openstack-operators] Octavia on ocata centos 7 In-Reply-To: References: Message-ID: Many thanks for your help. Ignazio Il Gio 10 Mag 2018 21:05 iain MacDonnell ha scritto: > > > On 05/10/2018 10:45 AM, Ignazio Cassano wrote: > > I am moving from lbaas v2 based on haproxy driver to octavia on centos 7 > > ocata. > [snip] > > On the octavia server all services are active, amphora images are > > installed, but when I try to create a load balancer: > > > > nuutron lbaas-loadbalancer-create --name lb1 private-subnet > > > > it tries to connect to 127.0.0.1:5000 > > Google found: > > https://bugzilla.redhat.com/show_bug.cgi?id=1434904 => > https://bugzilla.redhat.com/show_bug.cgi?id=1433728 > > Seems that you may be missing the service_auth section from > neutron_lbaas.conf or/and octavia.conf ? > > I've been through the frustration of trying to get Octavia working. The > docs are bit iffy, and it's ... "still maturing" (from my observation). > > I think I did have it working with neutron_lbaasv2 at one point. My > neutron_lbaas.conf included: > > [service_auth] > auth_url = http://mykeystonehost:35357/v3 > admin_user = neutron > admin_tenant_name = service > admin_password = n0ttell1nU > admin_user_domain = default > admin_project_domain = default > region = myregion > > and octavia.conf: > > [service_auth] > memcached_servers = mymemcachedhost:11211 > auth_url = http://mykeystonehost:35357 > auth_type = password > project_domain_name = default > project_name = service > user_domain_name = default > username = octavia > password = n0ttell1nU > > > Not sure how correct those are, but IIRC it did basically work. > > I've since moved to pure Octavia on Queens, where there is no > neutron_lbaas. > > GL! > > ~iain > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jp.methot at planethoster.info Thu May 10 23:30:16 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Fri, 11 May 2018 08:30:16 +0900 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud In-Reply-To: References: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> Message-ID: <1A1729FA-BE7F-4A42-A42F-BC9B772DFE73@planethoster.info> >> >>> I currently operate a multi-region cloud split between 2 geographic >>> locations. I have updated it to Pike not too long ago, but I've been >>> running into a peculiar issue. Ever since the Pike release, Nova now >>> asks Keystone if a new project exists in Keystone before configuring >>> the project’s quotas. However, there doesn’t seem to be any region >>> restriction regarding which endpoint Nova will query Keystone on. So, >>> right now, if I create a new project in region one, Nova will query >>> Keystone in region two. Because my keystone databases are not synched >>> in real time between each region, the region two Keystone will tell >>> it that the new project doesn't exist, while it exists in region one >>> Keystone. > Are both keystone nodes completely separate? Do they share any information? I share the DB information between both. In our use case, we very rarely make changes to keystone (password change, user creation, project creation) and there is a limited number of people who even have access to it, so I can get away with having my main DB in region 1 and hosting an exact copy in region 2. The original idea was to have a mysql slave in region 2, but that failed and we decided to go with manually replicating the keystone DB whenever we would make changes. This means I have the same users and projects in both regions, which is exactly what I want right now for my specific use case. Of course, that also means I only do operations in keystone in Region 1 and never from Region 2 to prevent discrepancies. >>> >>> Thinking that this could be a configuration error, I tried setting >>> the region_name in keystone_authtoken, but that didn’t change much of >>> anything. Right now I am thinking this may be a bug. Could someone >>> confirm that this is indeed a bug and not a configuration error? >>> >>> To circumvent this issue, I am considering either modifying the >>> database by hand or trying to implement realtime replication between >>> both Keystone databases. Would there be another solution? (beside >>> modifying the code for the Nova check) > A variant of this just came up as a proposal for the Forum in a couple > weeks [0]. A separate proposal was also discussed during this week's > keystone meeting [1], which brought up an interesting solution. We > should be seeing a specification soon that covers the proposal in > greater detail and includes use cases. Either way, both sound like they > may be relevant to you. > > [0] https://etherpad.openstack.org/p/YVR-edge-keystone-brainstorming > [1] > http://eavesdrop.openstack.org/meetings/keystone/2018/keystone.2018-05-08-16.00.log.html#l-156 This is interesting. Unfortunately I will not be in Vancouver, but I will keep an eye on it in the future. I will need to find a way to solve the current issue at hand shortly though. >> >> This is the specific code you're talking about: >> >> https://github.com/openstack/nova/blob/stable/pike/nova/api/openstack/identity.py#L35 >> >> >> I don't see region_name as a config option for talking to keystone in >> Pike: >> >> https://docs.openstack.org/nova/pike/configuration/config.html#keystone >> >> But it is in Queens: >> >> https://docs.openstack.org/nova/queens/configuration/config.html#keystone >> >> That was added in this change: >> >> https://review.openstack.org/#/c/507693/ >> >> But I think what you're saying is, since you have multiple regions, >> the project could be in any of them at any given time until they >> synchronize so configuring nova for a specific region isn't probably >> going to help in this case, right? >> >> Isn't this somehow resolved with keystone federation? Granted, I'm not >> at all a keystone person, but I'd think this isn't a unique problem. > Without knowing a whole lot about the current setup, I'm inclined to say > it is. Keystone-to-keystone federation was developed for cases like > this, and it's been something we've been trying to encourage in favor of > building replication tooling outside of the database or over an API. The > main concerns with taking a manual replication approach is that it could > negatively impact overall performance and that keystone already assumes > it will be in control of ID generation for most cases (replicating a > project in RegionOne into RegionTwo will yield a different project ID, > even though it is possible for both to have the same name). > Additionally, there are some things that keystone doesn't expose over > the API that would need to be replicated, like revocation events (I > mentioned this in the etherpad linked above). To answer the questions of both posts: 1.I was talking about the region-name parameter underneath keystone_authtoken. That is in the pike doc you linked, but I am unaware if this is only used for token generation or not. Anyhow, it doesn’t seem to have any impact on the issue at hand. 2.My understanding of the issue is this: -Keystone creates new project in region 1 -Nova wants to check if the project exists in keystone, so it asks keystone for its endpoint list. -Nova picks the first endpoint in the list, which happens to be the region 2 endpoint (my endpoint list has the endpoints of both regions since I manage from a single horizon/controller node). -Since there’s no real-time replication, region 2 replies that the project doesn’t exist, while it exists in region 1. I may be wrong about my assumption that it picks the region 2 endpoint, but the facts are that it does query region 2 keystone when it shouldn’t (I see the 404s in the region 2 logs) 3.I haven't really looked into keystone federation yet, but wouldn’t it cause issues if projects in 2 different regions have the same uuid? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 10 23:36:06 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 10 May 2018 18:36:06 -0500 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud In-Reply-To: <1A1729FA-BE7F-4A42-A42F-BC9B772DFE73@planethoster.info> References: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> <1A1729FA-BE7F-4A42-A42F-BC9B772DFE73@planethoster.info> Message-ID: <62e79005-7f7b-aa8d-0262-bfc267ca6b3f@gmail.com> On 5/10/2018 6:30 PM, Jean-Philippe Méthot wrote: > 1.I was talking about the region-name parameter underneath > keystone_authtoken. That is in the pike doc you linked, but I am unaware > if this is only used for token generation or not. Anyhow, it doesn’t > seem to have any impact on the issue at hand. The [keystone]/region_name config option in nova is used to pike the identity service endpoint so I think in that case region_one will matter if there are multiple identity endpoints in the service catalog. The only thing is you're on pike where [keystone]/region_name isn't in nova.conf and it's not used, it was added in queens for this lookup: https://review.openstack.org/#/c/507693/ So that might be why it doesn't seem to make a difference if you set it in nova.conf - because the nova code isn't actually using it. You could try backporting that patch into your pike deployment, set region_name to RegionOne and see if it makes a difference (although I thought RegionOne was the default if not specified?). -- Thanks, Matt From jp.methot at planethoster.info Fri May 11 00:04:32 2018 From: jp.methot at planethoster.info (=?utf-8?Q?Jean-Philippe_M=C3=A9thot?=) Date: Fri, 11 May 2018 09:04:32 +0900 Subject: [Openstack-operators] New project creation fails because of a Nova check in a multi-region cloud In-Reply-To: <62e79005-7f7b-aa8d-0262-bfc267ca6b3f@gmail.com> References: <82bad7a4-0f03-fe6d-7179-7f50b42f3502@gmail.com> <1A1729FA-BE7F-4A42-A42F-BC9B772DFE73@planethoster.info> <62e79005-7f7b-aa8d-0262-bfc267ca6b3f@gmail.com> Message-ID: <48915EC3-5BD0-4156-95C8-E67EDEB9AD2F@planethoster.info> > Le 11 mai 2018 à 08:36, Matt Riedemann a écrit : > > On 5/10/2018 6:30 PM, Jean-Philippe Méthot wrote: >> 1.I was talking about the region-name parameter underneath keystone_authtoken. That is in the pike doc you linked, but I am unaware if this is only used for token generation or not. Anyhow, it doesn’t seem to have any impact on the issue at hand. > > The [keystone]/region_name config option in nova is used to pike the identity service endpoint so I think in that case region_one will matter if there are multiple identity endpoints in the service catalog. The only thing is you're on pike where [keystone]/region_name isn't in nova.conf and it's not used, it was added in queens for this lookup: > > https://review.openstack.org/#/c/507693/ > > So that might be why it doesn't seem to make a difference if you set it in nova.conf - because the nova code isn't actually using it. > I was talking about the parameter under [keystone_authtoken] ([keystone_authtoken]/region_name) and not the new one under [keystone] ([keystone]/region_name). It seems that we were talking about different parameters though so this explains that. > You could try backporting that patch into your pike deployment, set region_name to RegionOne and see if it makes a difference (although I thought RegionOne was the default if not specified?). I will attempt this next week. Will update if I run into any issues. Also, from experience, most Openstack services seem to pick a random endpoint when region_name isn’t specified in a multi-region cloud. I’ve seen that several times ever since I've built and started maintaining this infrastructure. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri May 11 08:57:02 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 11 May 2018 10:57:02 +0200 Subject: [Openstack-operators] octavia on ocata no amphora instaces are created Message-ID: Hi everyone, I installed octavia on ocata centos 7. Load balancer, listener and pool are created and they are active but I cannot see any amphora instance. No errors in octavia logs. Could anyone help me ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri May 11 10:29:30 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 11 May 2018 12:29:30 +0200 Subject: [Openstack-operators] octavia amphora instances on ocata Message-ID: Hi everyone, I installed octavia on ocata centos 7 and now when I create a load balancer amphora instances are automatically created but there are some problems: 1) amphora-agent on amphora instances is in error state because it needs ceertificates (Must I create amphora image with certicates on it ? Or certicates are copyed durinig instance deployment ?) 2) health-manager.log reports: Amphora 4e6d19d3-bc19-4882-aeca-4772b069c53b health message reports 0 listeners when 1 expected Please, could anyone explain what happens ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Fri May 11 16:28:35 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Fri, 11 May 2018 18:28:35 +0200 Subject: [Openstack-operators] ocata octavia apmhora ssl error Message-ID: Hi eveyone, I am trying to configura octavia lbaas on ocata centos 7. When I create the load balancer a vm is created from amphora image but worker log reports: 018-05-11 17:38:56.013 125607 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying. 2018-05-11 17:39:01.016 125607 WARNING octavia.amphorae.drivers.haproxy.rest_api_driver [-] Could not connect to instance. Retrying. I think it is trying to connect the amphora instance on port 9443 but in the amphora instance /var/log/amphora-agent.log file the following is reported: 2018-05-11 15:38:45 +0000] [900] [DEBUG] Failed to send error message. [2018-05-11 15:38:50 +0000] [900] [DEBUG] Error processing SSL request. [2018-05-11 15:38:50 +0000] [900] [DEBUG] Invalid request from ip=::ffff: 10.138.176.96: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake failure (_ssl.c:1977) [2018-05-11 15:38:50 +0000] [900] [DEBUG] Failed to send error message. [2018-05-11 15:38:55 +0000] [900] [DEBUG] Error processing SSL request. [2018-05-11 15:38:55 +0000] [900] [DEBUG] Invalid request from ip=::ffff: 10.138.176.96: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake failure (_ssl.c:1977) [2018-05-11 15:38:55 +0000] [900] [DEBUG] Failed to send error message. [2018-05-11 15:39:00 +0000] [900] [DEBUG] Error processing SSL request. [2018-05-11 15:39:00 +0000] [900] [DEBUG] Invalid request from ip=::ffff: 10.138.176.96: [SSL: SSL_HANDSHAKE_FAILURE] ssl handshake failure (_ssl.c:1977) [2018-05-11 15:39:00 +0000] [900] [DEBUG] Failed to send error message. 10.138.176.96 is the address of my controller worker. Security groups allow any protocol any port there aren't connection problem between networks. Probably some errors in certificate creation.... Anyone can help me, please? Is possible to disable ssl for testing ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Mon May 14 14:16:15 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Mon, 14 May 2018 16:16:15 +0200 Subject: [Openstack-operators] ocata gnocchi file system : erasing old data Message-ID: Hi everyone, I am osing ocata on centos 7 with ceilometer and gnocchi. The gnocchi backend is nfs and I would like to know if it is possible remove old data on the backend file system. Dome directories on the backend are 6 months old. Please, any suggestion ? Thanks and Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From jungleboyj at gmail.com Mon May 14 21:15:53 2018 From: jungleboyj at gmail.com (Jay S Bryant) Date: Mon, 14 May 2018 16:15:53 -0500 Subject: [Openstack-operators] [cinder] forum etherpads now available ... Message-ID: <2caa6f0d-2084-27f5-196e-fdecbf10d6f2@gmail.com> All, I have etherpads created for our Cinder related Forum discussions: * Tuesday, 5/22 11:00 to 11:40 - Room 221-222 - Cinder High Availability (HA) Discussion -https://etherpad.openstack.org/p/YVR18-cinder-ha-forum * Tuesday, 5/22 11:50 to 12:30 - Room 221-222 - Multi-attach Introduction and Future Direction -https://etherpad.openstack.org/p/YVR18-cinder-mutiattach-forum * Wednesday, 5/23 9:40 to 10:30 - Room 221-222 - Cinder's Documentation Discussion -https://etherpad.openstack.org/p/YVR18-cinder-documentation-forum We also have the session on using the placement service: * Monday 5/21 16:20 to 17:00 - Planning to use Placement in Cinderhttps://etherpad.openstack.org/p/YVR-cinder-placement Please take some time to look at the etherpads before the forum and add your thoughts/questions for discussion. Thank you! Jay Bryant (jungleboyj) -------------- next part -------------- An HTML attachment was scrubbed... URL: From gord at live.ca Tue May 15 01:33:24 2018 From: gord at live.ca (gordon chung) Date: Tue, 15 May 2018 01:33:24 +0000 Subject: [Openstack-operators] ocata gnocchi file system : erasing old data In-Reply-To: References: Message-ID: On 2018-05-14 10:16 AM, Ignazio Cassano wrote: > Hi everyone, > I am osing ocata on centos 7 with ceilometer and gnocchi. > The gnocchi backend is nfs and I would like to know if it is possible > remove old data on the backend file system. > Dome directories on the backend are 6 months old. > > Please, any suggestion ? there isn't a way to this without manually modifying the files (which can get a bit sketchy). Gnocchi is designed to capture (at most) the amount of data you define in your policy and does not prune data based on 'now' so it won't shrink on it's own over time. i guess we could support a tool that could prune based on 'now' but that doesn't exist currently. the safest way to clean old data is to delete the metric. if you want to delete only some of the metric that will be difficult. it'll require you figuring out what time range of data is stored in a given file (which is not difficult if you look at code), then properly deserialising, pruning, and reserialising (probably difficult or at least annoying). cheers, -- gord From mizuno.shintaro at lab.ntt.co.jp Tue May 15 06:06:29 2018 From: mizuno.shintaro at lab.ntt.co.jp (Shintaro Mizuno) Date: Tue, 15 May 2018 15:06:29 +0900 Subject: [Openstack-operators] [Forum] "DPDK/SR-IOV NFV Operational issues and way forward" session etherpad Message-ID: <0eda6a49-352d-5c04-da87-3f1ae72516ac@lab.ntt.co.jp> Hi I have created an etherpad page for "DPDK/SR-IOV NFV Operational issues and way forward" session at the Vancouver Forum [1]. It will take place on Wed 23, 11:50am - 12:30pm Vancouver Convention Centre West - Level Two - Room 221-222 If you are using/testing DPDK/SR-IOV for NFV workloads and interested in discussing their pros/cons and possible next steps for NFV operators and developers, please come join the session. Please also add your comment/topic proposals to the etherpad beforehand. [1] https://etherpad.openstack.org/p/YVR-dpdk-sriov-way-forward Any input is highly appreciated. Regards, Shintaro -- Shintaro MIZUNO (水野伸太郎) NTT Software Innovation Center TEL: 0422-59-4977 E-mail: mizuno.shintaro at lab.ntt.co.jp From ignaziocassano at gmail.com Tue May 15 06:40:05 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 15 May 2018 08:40:05 +0200 Subject: [Openstack-operators] ocata gnocchi file system : erasing old data In-Reply-To: References: Message-ID: Hi Gordon, please, let me to understand better.... I am collecting only instance metrics: I need to remove data about instances that ah been removed. I could do that with: gnocchi resource list --type instance -c id -f value if the instance has been delete I could: gnocchi resource delete instance id Does the above procedure remove data either from database or /var/lib/gnocchi directory ? Any suggestion ? Thanks and Regards Ignazio 2018-05-15 3:33 GMT+02:00 gordon chung : > > > On 2018-05-14 10:16 AM, Ignazio Cassano wrote: > > Hi everyone, > > I am osing ocata on centos 7 with ceilometer and gnocchi. > > The gnocchi backend is nfs and I would like to know if it is possible > > remove old data on the backend file system. > > Dome directories on the backend are 6 months old. > > > > Please, any suggestion ? > > there isn't a way to this without manually modifying the files (which > can get a bit sketchy). Gnocchi is designed to capture (at most) the > amount of data you define in your policy and does not prune data based > on 'now' so it won't shrink on it's own over time. > > i guess we could support a tool that could prune based on 'now' but that > doesn't exist currently. > > the safest way to clean old data is to delete the metric. if you want to > delete only some of the metric that will be difficult. it'll require you > figuring out what time range of data is stored in a given file (which is > not difficult if you look at code), then properly deserialising, > pruning, and reserialising (probably difficult or at least annoying). > > cheers, > > -- > gord > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sagaray at nttdata.co.jp Tue May 15 08:48:18 2018 From: sagaray at nttdata.co.jp (sagaray at nttdata.co.jp) Date: Tue, 15 May 2018 08:48:18 +0000 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> References: <1525919628734.2105@nttdata.co.jp>, <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> Message-ID: <1526374144863.89140@nttdata.co.jp> Hi Matt, > On 5/9/2018 9:33 PM, sagaray at nttdata.co.jp wrote: > > Operation planning of cold migration is difficult because cold migration time will vary drastically as it also depends on the load on storage servers at that point of time. If cold migration task stalls for any unknown reasons, operators may decide to cancel it manually. > > What storage backend are you using? What are some reasons that it has > stalled in the past? Our storage backend is EMC VNX, and we have not shared the instance-store storage among compute nodes. The storage is also accessed by external system. We store the service logs which are created by VM on that storage. Our system needs to backup those logs by transferring to other storage. Those logs sometimes becomes very large, and the load of storage also becomes high. In those situation, migrating the VM takes more time than expected in advance, so we would like to cancel some migration task on the way if maintenance time being close to the end. > On 5/9/2018 9:33 PM, sagaray at nttdata.co.jp wrote: > > We always do the maintenance work on midnight during limited time-slot to minimize impact to our users. > > Also, why are you doing maintenance with cold migration? Why not do live > migration for your maintenance (which already supports the abort function). We would like to migrate stopped servers as it is. As the reason above, we think we can operate the system more flexible if we able to cancel cold-migration as live-migration can. -------------------------------------------------- Yukinori Sagara Platform Engineering Department, NTT DATA Corp. ________________________________________ 差出人: Matt Riedemann 送信日時: 2018年5月10日 22:59 宛先: openstack-operators at lists.openstack.org 件名: Re: [Openstack-operators] Need feedback for nova aborting cold migration function On 5/9/2018 9:33 PM, sagaray at nttdata.co.jp wrote: > We always do the maintenance work on midnight during limited time-slot to minimize impact to our users. Also, why are you doing maintenance with cold migration? Why not do live migration for your maintenance (which already supports the abort function). -- Thanks, Matt _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From martialmichel at datamachines.io Tue May 15 20:15:40 2018 From: martialmichel at datamachines.io (Martial Michel) Date: Tue, 15 May 2018 16:15:40 -0400 Subject: [Openstack-operators] [scientific] Scientific SIG - IRC meeting Tue 15th at 2100 UTC Message-ID: Hello, With a late email invitation, we will have our IRC meeting in the #openstack-meeting channel at 2100 UTC May 15th. Final agenda will be at https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_15th_2018 and to be a repeat of last week in preparation for the summit next week 1. SIG Cycle Report 1. https://etherpad.openstack.org/p/scientific-sig-report-queens 2. Call for Lighting Talks 1. https://etherpad.openstack.org/p/scientific -sig-vancouver2018-lighting-talks 3. AOB All are welcome. Looking forward to seeing you there -- Martial -------------- next part -------------- An HTML attachment was scrubbed... URL: From mebert at uvic.ca Tue May 15 20:58:57 2018 From: mebert at uvic.ca (Marcus Ebert) Date: Tue, 15 May 2018 13:58:57 -0700 (PDT) Subject: [Openstack-operators] [Openstack-sigs] [scientific] Scientific SIG - IRC meeting Tue 15th at 2100 UTC In-Reply-To: References: Message-ID: Hello all, I'm new to the list, so I would like to give a short introduction to what we do: I'm with the HEP Research Computing group at UVic, where we utilize (Openstack) clouds for the computing needs of different High Energy Physics groups. We don't use just single clouds but work on a system that unifies all clouds available to us in a way that it looks like a single computing resource for the user jobs, and for that it also handles the distribution of needed images to the different clouds. In addition, we are working on a system that unifies cloud storage on different clouds into a unified storage space with a single endpoint for all user jobs on any cloud, no matter on which clouds the data ends up or from where it is read. Although we have this storage federation in production now, it is still mainly work in progress. more general information can be found here: http://heprc.phys.uvic.ca/ https://heprc.blogspot.ca/ Unfortunately, I can't join the IRC meeting today, but will be at the summit next week in Vancouver. Could you please let me know which Scientific SIG activities are planned for it and on which days? (from the schedule, it's just Wednesday morning?) Cheers, Marcus On Tue, 15 May 2018, Martial Michel wrote: > Hello, > > With a late email invitation, we will have our IRC meeting in the > #openstack-meeting > channel at 2100 UTC May 15th. > Final agenda will be at > https://wiki.openstack.org/wiki/Scientific_SIG#IRC_Meeting_May_15th_2018 > and to be a repeat of last week in preparation for the summit next week > > > 1. SIG Cycle Report > 1. https://etherpad.openstack.org/p/scientific-sig-report-queens > 2. Call for Lighting Talks > 1. https://etherpad.openstack.org/p/scientific > -sig-vancouver2018-lighting-talks > 3. AOB > > > All are welcome. Looking forward to seeing you there -- Martial > From rico.lin.guanyu at gmail.com Wed May 16 06:18:09 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Wed, 16 May 2018 14:18:09 +0800 Subject: [Openstack-operators] [openstack-dev][heat][all] Heat now migrated to StoryBoard!! In-Reply-To: References: Message-ID: Bump the last time Hi all, As we keep adding more info to the migration guideline [1], you might like to take a look again. And do hope it will make things easier for you. If not, please find me in irc or mail. [1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info 2018-05-10 18:42 GMT+08:00 Rico Lin : > Hi all, > As we keep adding more info to the migration guideline [1], you might like > to take a look again. > And do hope it will make things easier for you. If not, please find me in > irc or mail. > > [1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info > > Here's the quick hint for you, your bug id is exactly your story id. > > 2018-05-07 18:27 GMT+08:00 Rico Lin : > >> Hi all, >> >> I updated more information to this guideline in [1]. >> Please must take a view on [1] to see what's been updated. >> will likely to keep update on that etherpad if new Q&A or issue found. >> >> Will keep trying to make this process as painless for you as possible, >> so please endure with us for now, and sorry for any inconvenience >> >> *[1] https://etherpad.openstack.org/p/Heat-StoryBoard-Migration-Info >> * >> >> 2018-05-05 12:15 GMT+08:00 Rico Lin : >> >>> looping heat-dashboard team >>> >>> 2018-05-05 12:02 GMT+08:00 Rico Lin : >>> >>>> Dear all Heat members and friends >>>> >>>> As you might award, OpenStack projects are scheduled to migrating ([5]) >>>> from Launchpad to StoryBoard [1]. >>>> For whom who like to know where to file a bug/blueprint, here are some >>>> heads up for you. >>>> >>>> *What's StoryBoard?* >>>> StoryBoard is a cross-project task-tracker, contains numbers of >>>> ``project``, each project contains numbers of ``story`` which you can think >>>> it as an issue or blueprint. Within each story, contains one or multiple >>>> ``task`` (task separate stories into the tasks to resolve/implement). To >>>> learn more about StoryBoard or how to make a good story, you can reference >>>> [6]. >>>> >>>> *How to file a bug?* >>>> This is actually simple, use your current ubuntu-one id to access to >>>> storyboard. Then find the corresponding project in [2] and create a story >>>> to it with a description of your issue. We should try to create tasks which >>>> to reference with patches in Gerrit. >>>> >>>> *How to work on a spec (blueprint)?* >>>> File a story like you used to file a Blueprint. Create tasks for your >>>> plan. Also you might want to create a task for adding spec( in heat-spec >>>> repo) if your blueprint needs documents to explain. >>>> I still leave current blueprint page open, so if you like to create a >>>> story from BP, you can still get information. Right now we will start work >>>> as task-driven workflow, so BPs should act no big difference with a bug in >>>> StoryBoard (which is a story with many tasks). >>>> >>>> *Where should I put my story?* >>>> We migrate all heat sub-projects to StoryBoard to try to keep the >>>> impact to whatever you're doing as small as possible. However, if you plan >>>> to create a new story, *please create it under heat project [4]* and >>>> tag it with what it might affect with (like python-heatclint, >>>> heat-dashboard, heat-agents). We do hope to let users focus their stories >>>> in one place so all stories will get better attention and project >>>> maintainers don't need to go around separate places to find it. >>>> >>>> *How to connect from Gerrit to StoryBoard?* >>>> We usually use following key to reference Launchpad >>>> Closes-Bug: ####### >>>> Partial-Bug: ####### >>>> Related-Bug: ####### >>>> >>>> Now in StoryBoard, you can use following key. >>>> Task: ###### >>>> Story: ###### >>>> you can find more info in [3]. >>>> >>>> *What I need to do for my exists bug/bps?* >>>> Your bug is automatically migrated to StoryBoard, however, the >>>> reference in your patches ware not, so you need to change your commit >>>> message to replace the old link to launchpad to new links to StoryBoard. >>>> >>>> *Do we still need Launchpad after all this migration are done?* >>>> As the plan, we won't need Launchpad for heat anymore once we have done >>>> with migrating. Will forbid new bugs/bps filed in Launchpad. Also, try to >>>> provide new information as many as possible. Hopefully, we can make >>>> everyone happy. For those newly created bugs during/after migration, don't >>>> worry we will disallow further create new bugs/bps and do a second migrate >>>> so we won't missed yours. >>>> >>>> [1] https://storyboard.openstack.org/ >>>> [2] https://storyboard.openstack.org/#!/project_group/82 >>>> [3] https://docs.openstack.org/infra/manual/developers.html# >>>> development-workflow >>>> [4] https://storyboard.openstack.org/#!/project/989 >>>> [5] https://docs.openstack.org/infra/storyboard/migration.html >>>> [6] https://docs.openstack.org/infra/storyboard/gui/tasks_st >>>> ories_tags.html#what-is-a-story >>>> >>>> >>>> >>>> -- >>>> May The Force of OpenStack Be With You, >>>> >>>> *Rico Lin*irc: ricolin >>>> >>>> >>> >>> >>> -- >>> May The Force of OpenStack Be With You, >>> >>> *Rico Lin*irc: ricolin >>> >>> >> >> >> -- >> May The Force of OpenStack Be With You, >> >> *Rico Lin*irc: ricolin >> >> > > > -- > May The Force of OpenStack Be With You, > > *Rico Lin*irc: ricolin > > -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From eumel at arcor.de Wed May 16 10:43:06 2018 From: eumel at arcor.de (Frank Kloeker) Date: Wed, 16 May 2018 12:43:06 +0200 Subject: [Openstack-operators] [I18n] [Docs] Forum session Vancouver Message-ID: <8d2e092118bf02c028c17151b8a34af5@arcor.de> Good morning, just a quick note when packing the suitcase: We have a Docs/I18n Forum session on Monday 21th, 13:30, direct after lunch [1]. Take the chance to discuss topics about project onboarding with translation or documentation, usage of translated documents or tools. Or just come to say Hello :-) Looking forward to see you there! kind regards Frank (PTL I18n) [1] https://etherpad.openstack.org/p/docs-i18n-project-onboarding-vancouver From gord at live.ca Wed May 16 14:45:55 2018 From: gord at live.ca (gordon chung) Date: Wed, 16 May 2018 14:45:55 +0000 Subject: [Openstack-operators] ocata gnocchi file system : erasing old data In-Reply-To: References: Message-ID: On 2018-05-15 2:40 AM, Ignazio Cassano wrote: > gnocchi resource delete instance id > > > Does the above procedure remove data either from database or > /var/lib/gnocchi directory ? not immediately, it will mark the data for deletion. there is a 'janitor' service that runs periodically that will remove the data. this is defined by the `metric_cleanup_delay` in the configuration file. cheers, -- gord From radu.popescu at emag.ro Wed May 16 15:30:59 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Wed, 16 May 2018 15:30:59 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time Message-ID: Hi all, we have the following setup: - Openstack Ocata deployed with Openstack Ansible (v15.1.7) - 66 compute nodes, each having between 50 and 150 VMs, depending on their hardware configuration - we don't use Ceilometer (so not adding extra load on RabbitMQ cluster) - using Openvswitch HA with DVR - all messaging are going through a 3 servers RabbitMQ cluster - we now have 3 CCs hosting (initially had 2) hosting every other internal service What happens is, when we create a large number of VMs (it's something we do on a daily basis, just to test different types of VMs and apps, around 300 VMs), there are some of them that don't get the network interface attached in a reasonable time. After investigating, we can see that Neutron Openvswitch agent sees the port attached to the server, from an Openstack point of view, I can see the tap interface created in Openvswitch using both its logs and dmesg, but I can see nova attaching the interface after a huge amount of time. (I could see even 45 minutes delay) Since I can't see any reasonable errors I could take care of, my last chance is this mailing list. Only thing I can think of, is that maybe libvirt is not able to attach the interface in a reasonable amount of time. But still, 45 minutes is way too much. At the moment: vif_plugging_is_fatal = True vif_plugging_timeout = 600 (modified from default 300s) That's because we needed VMs with networking. Otherwise, if either with error, either with no network, it's the same thing for us. Thanks, -- Radu Popescu > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed May 16 18:37:50 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 16 May 2018 20:37:50 +0200 Subject: [Openstack-operators] ocata gnocchi file system : erasing old data In-Reply-To: References: Message-ID: Many thanks Ignazio 2018-05-16 16:45 GMT+02:00 gordon chung : > > > On 2018-05-15 2:40 AM, Ignazio Cassano wrote: > > gnocchi resource delete instance id > > > > > > Does the above procedure remove data either from database or > > /var/lib/gnocchi directory ? > > not immediately, it will mark the data for deletion. there is a > 'janitor' service that runs periodically that will remove the data. this > is defined by the `metric_cleanup_delay` in the configuration file. > > cheers, > > -- > gord > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ignaziocassano at gmail.com Wed May 16 18:46:06 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Wed, 16 May 2018 20:46:06 +0200 Subject: [Openstack-operators] ocata octavia http loadbalancer error 503 Service Unavailable Message-ID: Hi everyone, I am using octavia on centos7 ocata. When I define a http load balancer it does not work. Accessing the load balancer address returns 503 error. The above happens when the load balancer protocol specified is HTTP. If the load balancer protocol specified is TCP , it works. Probably the amphora instance haproxy is facing some issue with http health check ? Could anyone help me , please ? Thanks & Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 16 21:09:42 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 16 May 2018 16:09:42 -0500 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: Message-ID: On 5/16/2018 10:30 AM, Radu Popescu | eMAG, Technology wrote: > but I can see nova attaching the interface after a huge amount of time. What specifically are you looking for in the logs when you see this? Are you passing pre-created ports to attach to nova or are you passing a network ID so nova will create the port for you during the attach call? This is where the ComputeManager calls the driver to plug the vif on the host: https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L5187 Assuming you're using the libvirt driver, the host vif plug happens here: https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1463 And the guest is updated here: https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L1472 vif_plugging_is_fatal and vif_plugging_timeout don't come into play here because we're attaching an interface to an existing server - or are you talking about during the initial creation of the guest, i.e. this code in the driver? https://github.com/openstack/nova/blob/stable/ocata/nova/virt/libvirt/driver.py#L5257 Are you seeing this in the logs for the given port? https://github.com/openstack/nova/blob/stable/ocata/nova/compute/manager.py#L6875 If not, it could mean that neutron-server never send the event to nova, so nova-compute timed out waiting for the vif plug callback event to tell us that the port is ready and the server can be changed to ACTIVE status. The neutron-server logs should log when external events are being sent to nova for the given port, you probably need to trace the requests and compare the nova-compute and neutron logs for a given server create request. -- Thanks, Matt From yu-kasuya at kddi-research.jp Thu May 17 05:39:03 2018 From: yu-kasuya at kddi-research.jp (Yuki Kasuya) Date: Thu, 17 May 2018 14:39:03 +0900 Subject: [Openstack-operators] [Forum] Fault Management/Monitoring for NFV/Edge/5G/IoT Message-ID: <0091929a-0ca3-ff11-5a41-4525c53a4fb9@kddi-research.jp> Hi All, I've created an etherpad for Fault Management/Monitoring for NFV/Edge/5G/IoT. It'll take place on Tuesday, May 22, 4:40pm-6:10pm @ Room 221-222. If you have any usecase/idea/challenge for FM at these new area, could you join this forum and add any topic/comment to etherpad. https://etherpad.openstack.org/p/YVR-fm-monitoring Best regards, Yuki -- --------------------------------------------- KDDI Research, Inc. Integrated Core Network Control And Management Laboratory Yuki Kasuya yu-kasuya at kddilabs.jp +81 80 9048 8405 From radu.popescu at emag.ro Thu May 17 11:49:48 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Thu, 17 May 2018 11:49:48 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: Message-ID: Hi, unfortunately, didn't get the reply in my inbox, so I'm answering from the link here: http://lists.openstack.org/pipermail/openstack-operators/2018-May/015270.html (hopefully, my reply will go to the same thread) Anyway, I can see the neutron openvswitch agent logs processing the interface way after the VM is up (in this case, 30 minutes). And after the vif plugin timeout of 5 minutes (currently 10 minutes). After searching for logs, I came out with an example here: (replaced nova compute hostname with "nova.compute.hostname") http://paste.openstack.org/show/1VevKuimoBMs4G8X53Eu/ As you can see, the request for the VM starts around 3:27AM. Ports get created, openvswitch has the command to do it, has DHCP, but apparently Neutron server sends the callback after Neutron Openvswitch agent finishes. Callback is at 2018-05-10 03:57:36.177 while Neutron Openvswitch agent says it completed the setup and configuration at 2018-05-10 03:57:35.247. So, my question is, why is Neutron Openvswitch agent processing the request 30 minutes after the VM is started? And where can I search for logs for whatever happens during those 30 minutes? And yes, we're using libvirt. At some point, we added some new nova compute nodes and the new ones came with v3.2.0 and was breaking migration between hosts. That's why we downgraded (and versionlocked) everything at v2.0.0. Thanks, Radu -------------- next part -------------- An HTML attachment was scrubbed... URL: From lmihaiescu at gmail.com Thu May 17 14:46:59 2018 From: lmihaiescu at gmail.com (George Mihaiescu) Date: Thu, 17 May 2018 10:46:59 -0400 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: Message-ID: We use "vif_plugging_is_fatal = False" and "vif_plugging_timeout = 0" as well as "no-ping" in the dnsmasq-neutron.conf, and large rally tests of 500 instances complete with no issues. These are some good blogposts about Neutron performance: https://www.mirantis.com/blog/openstack-neutron-performance-and-scalability-testing-summary/ https://www.mirantis.com/blog/improving-dhcp-performance-openstack/ I would run a large rally test like this one and see where time is spent mostly: { "NovaServers.boot_and_delete_server": [ { "args": { "flavor": { "name": "c2.small" }, "image": { "name": "^Ubuntu 16.04 - latest$" }, "force_delete": false }, "runner": { "type": "constant", "times": 500, "concurrency": 100 } } ] } Cheers, George On Thu, May 17, 2018 at 7:49 AM, Radu Popescu | eMAG, Technology < radu.popescu at emag.ro> wrote: > Hi, > > unfortunately, didn't get the reply in my inbox, so I'm answering from the > link here: > http://lists.openstack.org/pipermail/openstack-operators/ > 2018-May/015270.html > (hopefully, my reply will go to the same thread) > > Anyway, I can see the neutron openvswitch agent logs processing the > interface way after the VM is up (in this case, 30 minutes). And after the > vif plugin timeout of 5 minutes (currently 10 minutes). > After searching for logs, I came out with an example here: (replaced nova > compute hostname with "nova.compute.hostname") > > http://paste.openstack.org/show/1VevKuimoBMs4G8X53Eu/ > > As you can see, the request for the VM starts around 3:27AM. Ports get > created, openvswitch has the command to do it, has DHCP, but apparently > Neutron server sends the callback after Neutron Openvswitch agent finishes. > Callback is at 2018-05-10 03:57:36.177 while Neutron Openvswitch agent says > it completed the setup and configuration at 2018-05-10 03:57:35.247. > > So, my question is, why is Neutron Openvswitch agent processing the > request 30 minutes after the VM is started? And where can I search for logs > for whatever happens during those 30 minutes? > And yes, we're using libvirt. At some point, we added some new nova > compute nodes and the new ones came with v3.2.0 and was breaking migration > between hosts. That's why we downgraded (and versionlocked) everything at > v2.0.0. > > Thanks, > Radu > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 17 15:42:32 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 17 May 2018 10:42:32 -0500 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: Message-ID: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> On 5/17/2018 9:46 AM, George Mihaiescu wrote: > and large rally tests of 500 instances complete with no issues. Sure, except you can't ssh into the guests. The whole reason the vif plugging is fatal and timeout and callback code was because the upstream CI was unstable without it. The server would report as ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE guest that you can't actually do anything with is kind of pointless. -- Thanks, Matt From lmihaiescu at gmail.com Thu May 17 15:50:49 2018 From: lmihaiescu at gmail.com (George Mihaiescu) Date: Thu, 17 May 2018 11:50:49 -0400 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> Message-ID: We have other scheduled tests that perform end-to-end (assign floating IP, ssh, ping outside) and never had an issue. I think we turned it off because the callback code was initially buggy and nova would wait forever while things were in fact ok, but I'll change "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run another large test, just to confirm. We usually run these large tests after a version upgrade to test the APIs under load. On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann wrote: > On 5/17/2018 9:46 AM, George Mihaiescu wrote: > >> and large rally tests of 500 instances complete with no issues. >> > > Sure, except you can't ssh into the guests. > > The whole reason the vif plugging is fatal and timeout and callback code > was because the upstream CI was unstable without it. The server would > report as ACTIVE but the ports weren't wired up so ssh would fail. Having > an ACTIVE guest that you can't actually do anything with is kind of > pointless. > > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 17 16:39:13 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 17 May 2018 11:39:13 -0500 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <1526374144863.89140@nttdata.co.jp> References: <1525919628734.2105@nttdata.co.jp> <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> <1526374144863.89140@nttdata.co.jp> Message-ID: <9b1c9c3d-00dc-d073-96e7-4d6409521261@gmail.com> On 5/15/2018 3:48 AM, sagaray at nttdata.co.jp wrote: > We store the service logs which are created by VM on that storage. I don't mean to be glib, but have you considered maybe not doing that? -- Thanks, Matt From mriedemos at gmail.com Thu May 17 20:36:01 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 17 May 2018 15:36:01 -0500 Subject: [Openstack-operators] [nova] FYI on changes that might impact out of tree scheduler filters Message-ID: <58e08692-483a-9188-d2ee-e02978ce995c@gmail.com> CERN has upgraded to Cells v2 and is doing performance testing of the scheduler and were reporting some things today which got us back to this bug [1]. So I've starting pushing some patches related to this but also related to an older blueprint I created [2]. In summary, we do quite a bit of DB work just to load up a list of instance objects per host that the in-tree filters don't even use. The first change [3] is a simple optimization to avoid the default joins on the instance_info_caches and security_groups tables. If you have out of tree filters that, for whatever reason, rely on the HostState.instances objects to have info_cache or security_groups set, they'll continue to work, but will have to round-trip to the DB to lazy-load the fields, which is going to be a performance penalty on that filter. See the change for details. The second change in the series [4] is more drastic in that we'll do away with pulling the full Instance object per host, which means only a select set of optional fields can be lazy-loaded [5], and the rest will result in an exception. The patch currently has a workaround config option to continue doing things the old way if you have out of tree filters that rely on this, but for good citizens with only in-tree filters, you will get a performance improvement during scheduling. There are some other things we can do to optimize more of this flow, but this email is just about the ones that have patches up right now. [1] https://bugs.launchpad.net/nova/+bug/1737465 [2] https://blueprints.launchpad.net/nova/+spec/put-host-manager-instance-info-on-a-diet [3] https://review.openstack.org/#/c/569218/ [4] https://review.openstack.org/#/c/569247/ [5] https://github.com/openstack/nova/blob/de52fefa1fd52ccaac6807e5010c5f2a2dcbaab5/nova/objects/instance.py#L66 -- Thanks, Matt From gouthampravi at gmail.com Thu May 17 20:40:34 2018 From: gouthampravi at gmail.com (Goutham Pacha Ravi) Date: Thu, 17 May 2018 13:40:34 -0700 Subject: [Openstack-operators] [manila] manila operator's feedback forum etherpad available Message-ID: Cross posting from Openstack-dev because Tom's unable to post to this list yet. Manila operators, please note the session at the Forum next week. Thanks, Goutham ---------- Forwarded message ---------- From: Tom Barron Date: Thu, May 17, 2018 at 10:57 AM Subject: [openstack-dev] [manila] manila operator's feedback forum etherpad available To: openstack-operators at lists.openstack.org, openstack-dev at lists.openstack.org Next week at the Summit there is a forum session dedicated to Manila opertors' feedback on Thursday from 1:50-2:30pm [1] for which we have started an etherpad [2]. Please come and help manila developers do the right thing! We're particularly interested in experiences running the OpenStack share service at scale and overcoming any obstacles to deployment but are interested in getting any and all feedback from real deployments so that we can tailor our development and maintenance efforts to real world needs. Please feel free and encouraged to add to the etherpad starting now. See you there! -- Tom Barron Manila PTL irc: tbarron [1] https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21780/manila-ops-feedback-running-at-scale-overcoming-barriers-to-deployment [2] https://etherpad.openstack.org/p/YVR18-manila-forum-ops-feedback From rochelle.grober at huawei.com Fri May 18 00:55:22 2018 From: rochelle.grober at huawei.com (Rochelle Grober) Date: Fri, 18 May 2018 00:55:22 +0000 Subject: [Openstack-operators] [Forum] [all] [Stable] OpenStack is "mature" -- time to get serious on Maintainers -- Session etherpad and food for thought for discussion Message-ID: Folks, TL;DR The last session related to extended releases is: OpenStack is "mature" -- time to get serious on Maintainers It will be in room 220 at 11:00-11:40 The etherpad for the last session in the series on Extended releases is here: https://etherpad.openstack.org/p/YVR-openstack-maintainers-maint-pt3 There are links to info on other communities’ maintainer process/role/responsibilities also, as reference material on how other have made it work (or not). The nitty gritty details: The upcoming Forum is filled with sessions that are focused on issues needed to improve and maintain the sustainability of OpenStack projects for the long term. We have discussion on reducing technical debt, extended releases, fast forward installs, bringing Ops and User communities closer together, etc. The community is showing it is now invested in activities that are often part of “Sustaining Engineering” teams (corporate speak) or “Maintainers (OSS speak). We are doing this; we are thinking about the moving parts to do this; let’s think about the contributors who want to do these and bring some clarity to their roles and the processes they need to be successful. I am hoping you read this and keep these ideas in mind as you participate in the various Forum sessions. Then you can bring the ideas generated during all these discussions to the Maintainers session near the end of the Summit to brainstorm how to visualize and define this new(ish) component of our technical community. So, who has been doing the maintenance work so far? Mostly (mostly) unsung heroes like the Stable Release team, Release team, Oslo team, project liaisons and the community goals champions (yes, moving to py3 is a sustaining/maintenance type of activity). And some operators (Hi, mnaser!). We need to lean on their experience and what we think the community will need to reduce that technical debt to outline what the common tasks of maintainers should be, what else might fall in their purview, and how to partner with them to better serve them. With API lower limits, new tool versions, placement, py3, and even projects reaching “code complete” or “maintenance mode,” there is a lot of work for maintainers to do (I really don’t like that term, but is there one that fits OpenStack’s community?). It would be great if we could find a way to share the load such that we can have part time contributors here. We know that operators know how to cherrypick, test in there clouds, do bug fixes. How do we pair with them to get fixes upstreamed without requiring them to be full on developers? We have a bunch of alumni who have stopped being “cores” and sometimes even developers, but who love our community and might be willing and able to put in a few hours a week, maybe reviewing small patches, providing help with user/ops submitted patch requests, or whatever. They were trusted with +2 and +W in the past, so we should at least be able to trust they know what they know. We would need some way to identify them to Cores, since they would be sort of 1.5 on the voting scale, but…… So, burn out is high in other communities for maintainers. We need to find a way to make sustaining the stable parts of OpenStack sustainable. Hope you can make the talk, or add to the etherpad, or both. The etherpad is very musch still a work in progress (trying to organize it to make sense). If you want to jump in now, go for it, otherwise it should be in reasonable shape for use at the session. I hope we get a good mix of community and a good collection of those who are already doing the job without title. Thanks and see you next week. --rocky ________________________________ 华为技术有限公司 Huawei Technologies Co., Ltd. [Company_logo] Rochelle Grober Sr. Staff Architect, Open Source Office Phone:408-330-5472 Email:rochelle.grober at huawei.com ________________________________  本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 5474 bytes Desc: image001.png URL: From radu.popescu at emag.ro Fri May 18 08:21:55 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Fri, 18 May 2018 08:21:55 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> Message-ID: <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> Hi, so, nova says the VM is ACTIVE and actually boots with no network. We are setting some metadata that we use later on and have cloud-init for different tasks. So, VM is up, OS is running, but network is working after a random amount of time, that can get to around 45 minutes. Thing is, is not happening to all VMs in that test (around 300), but it's happening to a fair amount - around 25%. I can see the callback coming few seconds after neutron openvswitch agent says it's completed the setup. My question is, why is it taking so long for nova openvswitch agent to configure the port? I can see the port up in both host OS and openvswitch. I would assume it's doing the whole namespace and iptables setup. But still, 30 minutes? Seems a lot! Thanks, Radu On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: We have other scheduled tests that perform end-to-end (assign floating IP, ssh, ping outside) and never had an issue. I think we turned it off because the callback code was initially buggy and nova would wait forever while things were in fact ok, but I'll change "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run another large test, just to confirm. We usually run these large tests after a version upgrade to test the APIs under load. On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > wrote: On 5/17/2018 9:46 AM, George Mihaiescu wrote: and large rally tests of 500 instances complete with no issues. Sure, except you can't ssh into the guests. The whole reason the vif plugging is fatal and timeout and callback code was because the upstream CI was unstable without it. The server would report as ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE guest that you can't actually do anything with is kind of pointless. _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghanshyammann at gmail.com Fri May 18 09:08:03 2018 From: ghanshyammann at gmail.com (Ghanshyam Mann) Date: Fri, 18 May 2018 18:08:03 +0900 Subject: [Openstack-operators] [Openstack-sigs] [First Contact] [SIG] [Forum] First Contact SIG Operator Inclusion Session Message-ID: Hi All, As you might know, FirstContact SIG is planning a forum sessions "First Contact SIG Operator Inclusion" [1] on Monday, May 21, 3:10pm. This session will discuss about Operators inclusion in FirstConact SIG to setup the operator bridge in this SIG. Hope to see more operators in this sessions and their valuable feedback/help. For more details, please go through the etherpad [2]. ..1 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21712/first-contact-sig-operator-inclusion ..2 https://etherpad.openstack.org/p/FC-SIG-Ops-Inclusion -gmann From ebiibe82 at gmail.com Fri May 18 10:46:39 2018 From: ebiibe82 at gmail.com (Amit Kumar) Date: Fri, 18 May 2018 16:16:39 +0530 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment Message-ID: Hi All, We want to deploy our private cloud using OpenStack as highly available (zero downtime (ZDT) - in normal course of action and during upgrades as well) production grade environment. We came across following tools. - We thought of using *Kolla-Kubernetes* as deployment tool, but we got feedback from Kolla IRC channel that this project is being retired. Moreover, we couldn't find latest documents having multi-node deployment steps and, High Availability support was also not mentioned at all anywhere in the documentation. - Another option to have Kubernetes based deployment is to use OpenStack-Helm, but it seems the OSH community has not made OSH 1.0 officially available yet. - Last option, is to use *Kolla-Ansible*, although it is not a Kubernetes deployment, but seems to have good community support around it. Also, its documentation talks a little about production grade deployment, probably it is being used in production grade environments. If you folks have used any of these tools for deploying OpenStack to fulfill these requirements: HA and ZDT, then please provide your inputs specifically about HA and ZDT support of the deployment tool, based on your experience. And please share if you have any reference links that you have used for achieving HA and ZDT for the respective tools. Lastly, if you think we should think that we have missed another more viable and stable options of deployment tools which can serve our requirement: HA and ZDT, then please do suggest the same. Regards, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Fri May 18 12:17:58 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Fri, 18 May 2018 14:17:58 +0200 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment In-Reply-To: References: Message-ID: Hi amit, I’m using kolla-ansible as a solution on our own infrastructure, however, be aware that because of the nature of Openstack you wont be able to achieve zero downtime if your hosted application do not take advantage of the distributed natre of ressources or if they’re not basically Cloud ready. Cheers. Le ven. 18 mai 2018 à 12:47, Amit Kumar a écrit : > Hi All, > > We want to deploy our private cloud using OpenStack as highly available > (zero downtime (ZDT) - in normal course of action and during upgrades as > well) production grade environment. We came across following tools. > > > - We thought of using *Kolla-Kubernetes* as deployment tool, but we > got feedback from Kolla IRC channel that this project is being retired. > Moreover, we couldn't find latest documents having multi-node deployment > steps and, High Availability support was also not mentioned at all anywhere > in the documentation. > - Another option to have Kubernetes based deployment is to use > OpenStack-Helm, but it seems the OSH community has not made OSH 1.0 > officially available yet. > - Last option, is to use *Kolla-Ansible*, although it is not a > Kubernetes deployment, but seems to have good community support around it. > Also, its documentation talks a little about production grade deployment, > probably it is being used in production grade environments. > > > If you folks have used any of these tools for deploying OpenStack to > fulfill these requirements: HA and ZDT, then please provide your inputs > specifically about HA and ZDT support of the deployment tool, based on your > experience. And please share if you have any reference links that you have > used for achieving HA and ZDT for the respective tools. > > Lastly, if you think we should think that we have missed another more > viable and stable options of deployment tools which can serve our > requirement: HA and ZDT, then please do suggest the same. > > Regards, > Amit > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From james.page at canonical.com Fri May 18 13:15:07 2018 From: james.page at canonical.com (James Page) Date: Fri, 18 May 2018 14:15:07 +0100 Subject: [Openstack-operators] [sig] [upgrades] inaugural meeting minutes & vancouver forum In-Reply-To: References: Message-ID: Hi All Lujin, Lee and myself held the inaugural IRC meeting for the Upgrades SIG this week (see [0]). Suffice to say that, due to other time pressures, setup of the SIG has taken a lot longer than desired, but hopefully now we have the ball rolling we can keep up a bit of momentum. The Upgrades SIG intended to meet weekly, alternating between slots that work for (hopefully) all time zones: http://eavesdrop.openstack.org/#Upgrades_SIG That said, we'll skip next weeks meeting due to the OpenStack Summit and Forum in Vancouver, where we have a BoF on the schedule (see [1]) instead. If you're interested in OpenStack Upgrades the BoF and Erik's sessions on Fast Forward Upgrades (see [2]) should be on your schedule for next week! Cheers James [0] http://eavesdrop.openstack.org/meetings/upgrade_sig/2018/upgrade_sig.2018-05-15-09.06.html [1] https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21855/upgrade-sig-bof [2] https://www.openstack.org/summit/vancouver-2018/summit-schedule/global-search?t=upgrades -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Fri May 18 16:42:17 2018 From: emccormick at cirrusseven.com (Erik McCormick) Date: Fri, 18 May 2018 09:42:17 -0700 Subject: [Openstack-operators] Fast Forward Upgrades (FFU) Forum Sessions Message-ID: Hello all, There are two forum sessions in Vancouver covering Fast Forward Upgrades. Session 1 (Current State): Wednesday May 23rd, 09:00 - 09:40, Room 220 Session 2 (Future Work): Wednesday May 23rd, 09:50 - 10:30, Room 220 The combined etherpad for both sessions can be found at: https://etherpad.openstack.org/p/YVR-forum-fast-forward-upgrades Please take some time to add in topics you would like to see discussed or add any other pertinent information. There are several reference links at the top which are worth reviewing prior to the sessions if you have the time. See you all in Vancover! Cheers, Erik From ebiibe82 at gmail.com Fri May 18 18:59:52 2018 From: ebiibe82 at gmail.com (Amit Kumar) Date: Sat, 19 May 2018 00:29:52 +0530 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment In-Reply-To: References: Message-ID: Hi, Thanks for sharing your experience. I am talking about HA of only OpenStack services and not the hosted applications or the OpenStack instances they are hosted on. So, for now it is not the requirement. But from your response, it seems that you have deployed OpenStack with Kolla-Ansible in multi node, multi-Controller architecture, right? And any experience with Kolla-Ansible from OpenStack release upgrade perspective? Is ZDT of OpenStack services feasible while upgrading? Regards, Amit On May 18, 2018 5:48 PM, "Flint WALRUS" wrote: Hi amit, I’m using kolla-ansible as a solution on our own infrastructure, however, be aware that because of the nature of Openstack you wont be able to achieve zero downtime if your hosted application do not take advantage of the distributed natre of ressources or if they’re not basically Cloud ready. Cheers. Le ven. 18 mai 2018 à 12:47, Amit Kumar a écrit : > Hi All, > > We want to deploy our private cloud using OpenStack as highly available > (zero downtime (ZDT) - in normal course of action and during upgrades as > well) production grade environment. We came across following tools. > > > - We thought of using *Kolla-Kubernetes* as deployment tool, but we > got feedback from Kolla IRC channel that this project is being retired. > Moreover, we couldn't find latest documents having multi-node deployment > steps and, High Availability support was also not mentioned at all anywhere > in the documentation. > - Another option to have Kubernetes based deployment is to use > OpenStack-Helm, but it seems the OSH community has not made OSH 1.0 > officially available yet. > - Last option, is to use *Kolla-Ansible*, although it is not a > Kubernetes deployment, but seems to have good community support around it. > Also, its documentation talks a little about production grade deployment, > probably it is being used in production grade environments. > > > If you folks have used any of these tools for deploying OpenStack to > fulfill these requirements: HA and ZDT, then please provide your inputs > specifically about HA and ZDT support of the deployment tool, based on your > experience. And please share if you have any reference links that you have > used for achieving HA and ZDT for the respective tools. > > Lastly, if you think we should think that we have missed another more > viable and stable options of deployment tools which can serve our > requirement: HA and ZDT, then please do suggest the same. > > Regards, > Amit > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Kevin.Fox at pnnl.gov Fri May 18 20:07:01 2018 From: Kevin.Fox at pnnl.gov (Fox, Kevin M) Date: Fri, 18 May 2018 20:07:01 +0000 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment In-Reply-To: References: Message-ID: <1A3C52DFCD06494D8528644858247BF01C0D11A7@EX10MBOX03.pnnl.gov> I don't think openstack itself can meet full zero downtime requirements. But if it can, then I also think none of the deployment tools try and support that use case either. Thanks, Kevin ________________________________ From: Amit Kumar [ebiibe82 at gmail.com] Sent: Friday, May 18, 2018 3:46 AM To: OpenStack Operators; Openstack Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment Hi All, We want to deploy our private cloud using OpenStack as highly available (zero downtime (ZDT) - in normal course of action and during upgrades as well) production grade environment. We came across following tools. * We thought of using Kolla-Kubernetes as deployment tool, but we got feedback from Kolla IRC channel that this project is being retired. Moreover, we couldn't find latest documents having multi-node deployment steps and, High Availability support was also not mentioned at all anywhere in the documentation. * Another option to have Kubernetes based deployment is to use OpenStack-Helm, but it seems the OSH community has not made OSH 1.0 officially available yet. * Last option, is to use Kolla-Ansible, although it is not a Kubernetes deployment, but seems to have good community support around it. Also, its documentation talks a little about production grade deployment, probably it is being used in production grade environments. If you folks have used any of these tools for deploying OpenStack to fulfill these requirements: HA and ZDT, then please provide your inputs specifically about HA and ZDT support of the deployment tool, based on your experience. And please share if you have any reference links that you have used for achieving HA and ZDT for the respective tools. Lastly, if you think we should think that we have missed another more viable and stable options of deployment tools which can serve our requirement: HA and ZDT, then please do suggest the same. Regards, Amit -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.therond at gmail.com Fri May 18 20:35:56 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Fri, 18 May 2018 22:35:56 +0200 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment In-Reply-To: <1A3C52DFCD06494D8528644858247BF01C0D11A7@EX10MBOX03.pnnl.gov> References: <1A3C52DFCD06494D8528644858247BF01C0D11A7@EX10MBOX03.pnnl.gov> Message-ID: Oh ok! Yes, if you only focus on the control plan, the answer is yes, I’m using kolla-ansible and it’s working really well. It helped us to bring more services online more quickly and solved our lifecycle management that was kind of tricky. I’m using a Blue/Green deployement method and yes I’m using multinode form. Remember that kolla-ansible is a simple shell script wrapping ansible-playbook and that if you’re curious of what the playbooks look like you just have to install it (or goes on github) with pip and then get your hands on it. Kind regards. Le ven. 18 mai 2018 à 22:07, Fox, Kevin M a écrit : > I don't think openstack itself can meet full zero downtime requirements. > But if it can, then I also think none of the deployment tools try and > support that use case either. > > Thanks, > Kevin > ------------------------------ > *From:* Amit Kumar [ebiibe82 at gmail.com] > *Sent:* Friday, May 18, 2018 3:46 AM > *To:* OpenStack Operators; Openstack > *Subject:* [Openstack-operators] [OpenStack-Operators][OpenStack] > Regarding production grade OpenStack deployment > > Hi All, > > We want to deploy our private cloud using OpenStack as highly available > (zero downtime (ZDT) - in normal course of action and during upgrades as > well) production grade environment. We came across following tools. > > > - We thought of using *Kolla-Kubernetes* as deployment tool, but we > got feedback from Kolla IRC channel that this project is being retired. > Moreover, we couldn't find latest documents having multi-node deployment > steps and, High Availability support was also not mentioned at all anywhere > in the documentation. > - Another option to have Kubernetes based deployment is to use > OpenStack-Helm, but it seems the OSH community has not made OSH 1.0 > officially available yet. > - Last option, is to use *Kolla-Ansible*, although it is not a > Kubernetes deployment, but seems to have good community support around it. > Also, its documentation talks a little about production grade deployment, > probably it is being used in production grade environments. > > > If you folks have used any of these tools for deploying OpenStack to > fulfill these requirements: HA and ZDT, then please provide your inputs > specifically about HA and ZDT support of the deployment tool, based on your > experience. And please share if you have any reference links that you have > used for achieving HA and ZDT for the respective tools. > > Lastly, if you think we should think that we have missed another more > viable and stable options of deployment tools which can serve our > requirement: HA and ZDT, then please do suggest the same. > > Regards, > Amit > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.friesen at windriver.com Fri May 18 20:39:39 2018 From: chris.friesen at windriver.com (Chris Friesen) Date: Fri, 18 May 2018 14:39:39 -0600 Subject: [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding production grade OpenStack deployment In-Reply-To: <1A3C52DFCD06494D8528644858247BF01C0D11A7@EX10MBOX03.pnnl.gov> References: <1A3C52DFCD06494D8528644858247BF01C0D11A7@EX10MBOX03.pnnl.gov> Message-ID: <5AFF3A0B.3010400@windriver.com> Are you talking about downtime of instances (and the dataplane), or of the OpenStack API and control plane? And when you say "zero downtime" are you really talking about "five nines" or similar? Because nothing is truly zero downtime. If you care about HA then you'll need additional components outside of OpenStack proper. You'll need health checks on your physical nodes, health checks on your network links, possibly end-to-end health checks up into the applications running in your guests, redundant network paths, redundant controller nodes, HA storage, etc. You'll have to think about how to ensure your database and messaging service are HA. You may want to look at ensuring that your OpenStack services do not interfere with the VMs running on that node and vice versa. We ended up rolling our own install mechanisms because we weren't satisfied with any of the existing projects. That was a while ago now so I don't know how far they've come. Chris On 05/18/2018 02:07 PM, Fox, Kevin M wrote: > I don't think openstack itself can meet full zero downtime requirements. But if > it can, then I also think none of the deployment tools try and support that use > case either. > > Thanks, > Kevin > -------------------------------------------------------------------------------- > *From:* Amit Kumar [ebiibe82 at gmail.com] > *Sent:* Friday, May 18, 2018 3:46 AM > *To:* OpenStack Operators; Openstack > *Subject:* [Openstack-operators] [OpenStack-Operators][OpenStack] Regarding > production grade OpenStack deployment > > Hi All, > > We want to deploy our private cloud using OpenStack as highly available (zero > downtime (ZDT) - in normal course of action and during upgrades as well) > production grade environment. We came across following tools. > > * We thought of using /Kolla-Kubernetes/ as deployment tool, but we got > feedback from Kolla IRC channel that this project is being retired. > Moreover, we couldn't find latest documents having multi-node deployment > steps and, High Availability support was also not mentioned at all anywhere > in the documentation. > * Another option to have Kubernetes based deployment is to use OpenStack-Helm, > but it seems the OSH community has not made OSH 1.0 officially available yet. > * Last option, is to use /Kolla-Ansible/, although it is not a Kubernetes > deployment, but seems to have good community support around it. Also, its > documentation talks a little about production grade deployment, probably it > is being used in production grade environments. > > > If you folks have used any of these tools for deploying OpenStack to fulfill > these requirements: HA and ZDT, then please provide your inputs specifically > about HA and ZDT support of the deployment tool, based on your experience. And > please share if you have any reference links that you have used for achieving HA > and ZDT for the respective tools. > > Lastly, if you think we should think that we have missed another more viable and > stable options of deployment tools which can serve our requirement: HA and ZDT, > then please do suggest the same. > > Regards, > Amit > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From lbragstad at gmail.com Fri May 18 21:02:47 2018 From: lbragstad at gmail.com (Lance Bragstad) Date: Fri, 18 May 2018 16:02:47 -0500 Subject: [Openstack-operators] [User-committee] [Forum] [all] [Stable] OpenStack is "mature" -- time to get serious on Maintainers -- Session etherpad and food for thought for discussion In-Reply-To: References: Message-ID: <1d7a6055-df34-c0f6-98a0-d8a8f9cfafa8@gmail.com> Here is the link to the session in case you'd like to add it to your schedule [0]. [0] https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21759/openstack-is-mature-time-to-get-serious-on-maintainers On 05/17/2018 07:55 PM, Rochelle Grober wrote: > > Folks, > >   > > TL;DR > > The last session related to extended releases is: OpenStack is > "mature" -- time to get serious on Maintainers > It will be in room 220 at 11:00-11:40 > > The etherpad for the last session in the series on Extended releases > is here: > > https://etherpad.openstack.org/p/YVR-openstack-maintainers-maint-pt3 > >   > > There are links to info on other communities’ maintainer > process/role/responsibilities also, as reference material on how other > have made it work (or not). > >   > > The nitty gritty details: > >   > > The upcoming Forum is filled with sessions that are focused on issues > needed to improve and maintain the sustainability of OpenStack > projects for the long term.  We have discussion on reducing technical > debt, extended releases, fast forward installs, bringing Ops and User > communities closer together, etc.  The community is showing it is now > invested in activities that are often part of “Sustaining Engineering” > teams (corporate speak) or “Maintainers (OSS speak).  We are doing > this; we are thinking about the moving parts to do this; let’s think > about the contributors who want to do these and bring some clarity to > their roles and the processes they need to be successful.  I am hoping > you read this and keep these ideas in mind as you participate in the > various Forum sessions.  Then you can bring the ideas generated during > all these discussions to the Maintainers session near the end of the > Summit to brainstorm how to visualize and define this new(ish) > component of our technical community. > >   > > So, who has been doing the maintenance work so far?  Mostly (mostly) > unsung heroes like the Stable Release team, Release team, Oslo team, > project liaisons and the community goals champions (yes, moving to py3 > is a sustaining/maintenance type of activity).  And some operators > (Hi, mnaser!).  We need to lean on their experience and what we think > the community will need to reduce that technical debt to outline what > the common tasks of maintainers should be, what else might fall in > their purview, and how to partner with them to better serve them. > >   > > With API lower limits, new tool versions, placement, py3, and even > projects reaching “code complete” or “maintenance mode,” there is a > lot of work for maintainers to do (I really don’t like that term, but > is there one that fits OpenStack’s community?).  It would be great if > we could find a way to share the load such that we can have part time > contributors here.  We know that operators know how to cherrypick, > test in there clouds, do bug fixes.  How do we pair with them to get > fixes upstreamed without requiring them to be full on developers?  We > have a bunch of alumni who have stopped being “cores” and sometimes > even developers, but who love our community and might be willing and > able to put in a few hours a week, maybe reviewing small patches, > providing help with user/ops submitted patch requests, or whatever.  > They were trusted with +2 and +W in the past, so we should at least be > able to trust they know what they know.  We  would need some way to > identify them to Cores, since they would be sort of 1.5 on the voting > scale, but…… > >   > > So, burn out is high in other communities for maintainers.  We need to > find a way to make sustaining the stable parts of OpenStack sustainable. > >   > > Hope you can make the talk, or add to the etherpad, or both.  The > etherpad is very musch still a work in progress (trying to organize it > to make sense).  If you want to jump in now, go for it, otherwise it > should be in reasonable shape for use at the session.  I hope we get a > good mix of community and a good collection of those who are already > doing the job without title. > >   > > Thanks and see you next week. > > --rocky > >   > >   > >   > > ------------------------------------------------------------------------ > > 华为技术有限公司 Huawei Technologies Co., Ltd. > > Company_logo > > Rochelle Grober > > Sr. Staff Architect, Open Source > Office Phone:408-330-5472 > Email:rochelle.grober at huawei.com > > ------------------------------------------------------------------------ > > 本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 > 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 > 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! > This e-mail and its attachments contain confidential information from > HUAWEI, which > is intended only for the person or entity whose address is listed > above. Any use of the > information contained herein in any way (including, but not limited > to, total or partial > disclosure, reproduction, or dissemination) by persons other than the > intended > recipient(s) is prohibited. If you receive this e-mail in error, > please notify the sender by > phone or email immediately and delete it! > >   > > > > _______________________________________________ > User-committee mailing list > User-committee at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 5474 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: OpenPGP digital signature URL: From rochelle.grober at huawei.com Fri May 18 21:07:46 2018 From: rochelle.grober at huawei.com (Rochelle Grober) Date: Fri, 18 May 2018 21:07:46 +0000 Subject: [Openstack-operators] [User-committee] [Forum] [all] [Stable] OpenStack is "mature" -- time to get serious on Maintainers -- Session etherpad and food for thought for discussion In-Reply-To: <1d7a6055-df34-c0f6-98a0-d8a8f9cfafa8@gmail.com> References: <1d7a6055-df34-c0f6-98a0-d8a8f9cfafa8@gmail.com> Message-ID: Thanks, Lance! Also, the more I think about it, the more I think Maintainer has too much baggage to use that term for this role. It really is “continuity” that we are looking for. Continuous important fixes, continuous updates of tools used to produce the SW. Keep this in the back of your minds for the discussion. And yes, this is a discussion to see if we are interested, and only if there is interest, how to move forward. --Rocky From: Lance Bragstad [mailto:lbragstad at gmail.com] Sent: Friday, May 18, 2018 2:03 PM To: Rochelle Grober ; openstack-dev ; openstack-operators ; user-committee Subject: Re: [User-committee] [Forum] [all] [Stable] OpenStack is "mature" -- time to get serious on Maintainers -- Session etherpad and food for thought for discussion Here is the link to the session in case you'd like to add it to your schedule [0]. [0] https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21759/openstack-is-mature-time-to-get-serious-on-maintainers On 05/17/2018 07:55 PM, Rochelle Grober wrote: Folks, TL;DR The last session related to extended releases is: OpenStack is "mature" -- time to get serious on Maintainers It will be in room 220 at 11:00-11:40 The etherpad for the last session in the series on Extended releases is here: https://etherpad.openstack.org/p/YVR-openstack-maintainers-maint-pt3 There are links to info on other communities’ maintainer process/role/responsibilities also, as reference material on how other have made it work (or not). The nitty gritty details: The upcoming Forum is filled with sessions that are focused on issues needed to improve and maintain the sustainability of OpenStack projects for the long term. We have discussion on reducing technical debt, extended releases, fast forward installs, bringing Ops and User communities closer together, etc. The community is showing it is now invested in activities that are often part of “Sustaining Engineering” teams (corporate speak) or “Maintainers (OSS speak). We are doing this; we are thinking about the moving parts to do this; let’s think about the contributors who want to do these and bring some clarity to their roles and the processes they need to be successful. I am hoping you read this and keep these ideas in mind as you participate in the various Forum sessions. Then you can bring the ideas generated during all these discussions to the Maintainers session near the end of the Summit to brainstorm how to visualize and define this new(ish) component of our technical community. So, who has been doing the maintenance work so far? Mostly (mostly) unsung heroes like the Stable Release team, Release team, Oslo team, project liaisons and the community goals champions (yes, moving to py3 is a sustaining/maintenance type of activity). And some operators (Hi, mnaser!). We need to lean on their experience and what we think the community will need to reduce that technical debt to outline what the common tasks of maintainers should be, what else might fall in their purview, and how to partner with them to better serve them. With API lower limits, new tool versions, placement, py3, and even projects reaching “code complete” or “maintenance mode,” there is a lot of work for maintainers to do (I really don’t like that term, but is there one that fits OpenStack’s community?). It would be great if we could find a way to share the load such that we can have part time contributors here. We know that operators know how to cherrypick, test in there clouds, do bug fixes. How do we pair with them to get fixes upstreamed without requiring them to be full on developers? We have a bunch of alumni who have stopped being “cores” and sometimes even developers, but who love our community and might be willing and able to put in a few hours a week, maybe reviewing small patches, providing help with user/ops submitted patch requests, or whatever. They were trusted with +2 and +W in the past, so we should at least be able to trust they know what they know. We would need some way to identify them to Cores, since they would be sort of 1.5 on the voting scale, but…… So, burn out is high in other communities for maintainers. We need to find a way to make sustaining the stable parts of OpenStack sustainable. Hope you can make the talk, or add to the etherpad, or both. The etherpad is very musch still a work in progress (trying to organize it to make sense). If you want to jump in now, go for it, otherwise it should be in reasonable shape for use at the session. I hope we get a good mix of community and a good collection of those who are already doing the job without title. Thanks and see you next week. --rocky ________________________________ 华为技术有限公司 Huawei Technologies Co., Ltd. [Company_logo] Rochelle Grober Sr. Staff Architect, Open Source Office Phone:408-330-5472 Email:rochelle.grober at huawei.com ________________________________  本邮件及其附件含有华为公司的保密信息,仅限于发送给上面地址中列出的个人或群组。禁 止任何其他人以任何形式使用(包括但不限于全部或部分地泄露、复制、或散发)本邮件中 的信息。如果您错收了本邮件,请您立即电话或邮件通知发件人并删除本邮件! This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it! _______________________________________________ User-committee mailing list User-committee at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/user-committee -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 5474 bytes Desc: image001.png URL: From gmann at ghanshyammann.com Sat May 19 14:24:59 2018 From: gmann at ghanshyammann.com (Ghanshyam Mann) Date: Sat, 19 May 2018 23:24:59 +0900 Subject: [Openstack-operators] [openstack-dev] [openstack-operators][qa] Tempest removal of test_get_service_by_service_and_host_name Message-ID: Hi All, Patch https://review.openstack.org/#/c/569112/1 removed the test_get_service_by_service_and_host_name from tempest tree which looks ok as per bug and commit msg. This satisfy the condition of test removal as per process [1] and this mail is to complete the test removal process to check the external usage of this test. There is one place this tests is listed in Trio2o doc, i have raised patch in Trio2o to remove that to avoid any future confusion[2]. If this test is required by anyone, please respond to this mail, otherwise we are good here. ..1 https://docs.openstack.org/tempest/latest/test_removal.html ..2 https://review.openstack.org/#/c/569568/ -gmann From amy at demarco.com Sun May 20 16:18:22 2018 From: amy at demarco.com (Amy Marrich) Date: Sun, 20 May 2018 09:18:22 -0700 Subject: [Openstack-operators] OPs and User Sessions at Summit Message-ID: <614A0623-1D14-43FA-8EC4-AE71FCEA4BCD@demarco.com> Hi everyone, There are a lot of great events and sessions going on at Summit next week that I wanted to bring your attention to! Forum sessions are extremely important for starting and continuing conversations and really are a can't miss! For those of you who may not know what the forum is, it’s the opportunities for operators and developers to gather together to discuss requirements for the release, provide feedback and have strategic discussions. Amy (spotz) User Committee Diversity WG Chair Sunday, May 20 - All day Board Meeting Sunday, May 20 @6:00 - 8:00pm https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21791/weareopenstack-diversity-happy-hour-sponsored-by-red-hat-rsvp-required - Diversity Happy Hour Monday, May 21 @2:10 - 2:30 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21862/state-of-the-user-committee - Lightning Talk by Melvin & Matt Monday, May 21 @ 2:20 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21712/first-contact-sig-operator-inclusion - First Contact SIG Operator Inclusion Monday, May 21 @ 4:20 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21747/opsdevs-one-community - Ops/Devs One Community Monday, May 21 @5:10pm - 5: https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21788/ops-meetups-team-catch-up-and-ptg-merger-discussion - PTG Merger Discussion Monday, May 21, 6:00pm-7:00pm https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21865/ambassador-meet-and-greet-at-the-openinfra-mixer - Meet ambassadors. Have cocktails. Tuesday, May 22, 1:50 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21785/upgrading-openstack-war-stories - Upgrading War Stories, Wednesday, May 23 @ 5:30 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21725/openstack-operators-community-documentation - OpenStack Operators Community Documentation Thursday May 24 @ 9:00 and 9:50amam https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21721/extended-maintenance-part-i-past-present-and-future - Extended Maintenance: Parts I and II -------------- next part -------------- An HTML attachment was scrubbed... URL: From john.studarus at openstacksandiego.org Sun May 20 20:28:09 2018 From: john.studarus at openstacksandiego.org (John Studarus) Date: Sun, 20 May 2018 13:28:09 -0700 Subject: [Openstack-operators] OpenStack US & CA speaker opportunities Message-ID: <1637f3cef87.11d475b0f196695.942736707121116595@openstacksandiego.org> Dear OpenStack PTLs, devs, operators, and community leaders, We're reaching out to those interested in presenting at events across the US & Canada. The first opportunity is this July 10th, at the Intel Campus in Santa Clara, CA. The SF Bay Area OpenStack group is organizing a half day of presentations and labs with an evening social event to showcase Open Infrastructure and Cloud Native technologies (like Containers, and SDN). We have a number of invited, sponsored breakout sessions and lightening talks available. If you're interested, feel free to contact us directly via email or at the submission page below. https://www.papercall.io/openstack-8th-san-jose We're also happy to co-ordinate events with the Meetup groups across the US and Canada. If you're looking to get out and talk, just drop us a note and we can co-ordinate which groups would be convenient to you. Perhaps you'll be traveling and have the evening free to speak to a local group? We can make it happen! All three of us will be in Vancouver this week if you'd like to talk in person. John, Lisa, & Stacy OpenStack Ambassadors for North America and Canada ---- John Studarus - OpenStack Ambassador - John at OpenStackSanDiego.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From rico.lin.guanyu at gmail.com Mon May 21 09:15:33 2018 From: rico.lin.guanyu at gmail.com (Rico Lin) Date: Mon, 21 May 2018 17:15:33 +0800 Subject: [Openstack-operators] [openstack-dev][heat] Heat sessions in Vancouver summit!! And they're all in Tuesday! Message-ID: Dear all As Summit is about to start, looking forward to meet all of you here. Don't miss out sessions from Heat team. They're all on Tuesday! Feel free to let me know if you hope to see anything or learn anything from sessions. Will try my best to prepare it for you. *Tuesday 229:00am - 9:40am Users & Ops feedback for Heat * Vancouver Convention Centre West - Level Two - Room 220 https://www.openstack.org/summit/vancouver-2018/summit- schedule/events/21713/users-and-ops-feedback-for-heat *11:00am - 11:20am Heat - Project Update* Vancouver Convention Centre West - Level Two - Room 212 https://www.openstack.org/summit/vancouver-2018/summit- schedule/events/21595/heat-project-update *1:50pm - 2:30pm Heat - Project Onboarding* Vancouver Convention Centre West - Level Two - Room 223 https://www.openstack.org/summit/vancouver-2018/summit- schedule/events/21629/heat-project-onboarding See you all on Tuesday!! -- May The Force of OpenStack Be With You, *Rico Lin*irc: ricolin -------------- next part -------------- An HTML attachment was scrubbed... URL: From Eric.Smith at ccur.com Mon May 21 18:51:05 2018 From: Eric.Smith at ccur.com (Smith, Eric) Date: Mon, 21 May 2018 18:51:05 +0000 Subject: [Openstack-operators] Multiple Ceph pools for Nova? Message-ID: I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks (Separate roots within the CRUSH hierarchy). I’d like to run all instances in a single project / tenant on SSDs and the rest on spinning disks. How would I go about setting this up? -------------- next part -------------- An HTML attachment was scrubbed... URL: From emccormick at cirrusseven.com Mon May 21 19:17:24 2018 From: emccormick at cirrusseven.com (Erik McCormick) Date: Mon, 21 May 2018 12:17:24 -0700 Subject: [Openstack-operators] Multiple Ceph pools for Nova? In-Reply-To: References: Message-ID: Do you have enough hypervisors you can dedicate some to each purpose? You could make two availability zones each with a different backend. On Mon, May 21, 2018, 11:52 AM Smith, Eric wrote: > I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks > (Separate roots within the CRUSH hierarchy). I’d like to run all instances > in a single project / tenant on SSDs and the rest on spinning disks. How > would I go about setting this up? > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guilherme.pimentel at ccc.ufcg.edu.br Mon May 21 19:20:07 2018 From: guilherme.pimentel at ccc.ufcg.edu.br (Guilherme Steinmuller Pimentel Pimentel) Date: Mon, 21 May 2018 16:20:07 -0300 Subject: [Openstack-operators] Multiple Ceph pools for Nova? In-Reply-To: References: Message-ID: I usually separate things using host aggregate feature. In my deployment, I have 2 different nova pools. So, in nova.conf, I define the *images_rbd_pool* variable point to desired pool and then, I create an aggregate and put these computes into it. The flavor extra_spec metadata will define which aggregate the instance will be scheduled. 2018-05-21 15:51 GMT-03:00 Smith, Eric : > I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks > (Separate roots within the CRUSH hierarchy). I’d like to run all instances > in a single project / tenant on SSDs and the rest on spinning disks. How > would I go about setting this up? > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From guilherme.pimentel at ccc.ufcg.edu.br Mon May 21 19:31:40 2018 From: guilherme.pimentel at ccc.ufcg.edu.br (Guilherme Steinmuller Pimentel Pimentel) Date: Mon, 21 May 2018 16:31:40 -0300 Subject: [Openstack-operators] Multiple Ceph pools for Nova? In-Reply-To: References: Message-ID: 2018-05-21 16:17 GMT-03:00 Erik McCormick : > Do you have enough hypervisors you can dedicate some to each purpose? You > could make two availability zones each with a different backend. > I have about 20 hypervisors. Ten are using a nova pool with SAS disks and the other 10 are using another pool using SATA disks. Yes, making two availability zones is an option. I didn't dive deep into it when I was planning the deployment, so I am using the default nova availability zone and defining which pool to use by flavor/aggregate metadata. > > On Mon, May 21, 2018, 11:52 AM Smith, Eric wrote: > >> I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks >> (Separate roots within the CRUSH hierarchy). I’d like to run all instances >> in a single project / tenant on SSDs and the rest on spinning disks. How >> would I go about setting this up? >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue May 22 04:51:36 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Mon, 21 May 2018 21:51:36 -0700 Subject: [Openstack-operators] Multiple Ceph pools for Nova? In-Reply-To: References: Message-ID: <66b14191-cd3b-b455-3be9-80cf3629517e@gmail.com> On 5/21/2018 11:51 AM, Smith, Eric wrote: > I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks > (Separate roots within the CRUSH hierarchy). I’d like to run all > instances in a single project / tenant on SSDs and the rest on spinning > disks. How would I go about setting this up? As mentioned elsewhere, host aggregate would work for the compute hosts connected to each storage pool. Then you can have different flavors per aggregate and charge more for the SSD flavors or restrict the aggregates based on tenant [1]. Alternatively, if this is something you plan to eventually scale to a larger size, you could even separate the pools with separate cells and use resource provider aggregates in placement to mirror the host aggregates for tenant-per-cell filtering [2]. It sounds like this is very similar to what CERN does (cells per hardware characteristics and projects assigned to specific cells). So Belmiro could probably help give some guidance here too. Check out the talk he gave today at the summit [3]. [1] https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation [2] https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement [3] https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern -- Thanks, Matt From zioproto at gmail.com Tue May 22 13:29:45 2018 From: zioproto at gmail.com (Saverio Proto) Date: Tue, 22 May 2018 15:29:45 +0200 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> Message-ID: Hello Radu, do you have the Openstack rootwrap configured to work in daemon mode ? please read this article: 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology : > Hi, > > so, nova says the VM is ACTIVE and actually boots with no network. We are > setting some metadata that we use later on and have cloud-init for different > tasks. > So, VM is up, OS is running, but network is working after a random amount of > time, that can get to around 45 minutes. Thing is, is not happening to all > VMs in that test (around 300), but it's happening to a fair amount - around > 25%. > > I can see the callback coming few seconds after neutron openvswitch agent > says it's completed the setup. My question is, why is it taking so long for > nova openvswitch agent to configure the port? I can see the port up in both > host OS and openvswitch. I would assume it's doing the whole namespace and > iptables setup. But still, 30 minutes? Seems a lot! > > Thanks, > Radu > > On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: > > We have other scheduled tests that perform end-to-end (assign floating IP, > ssh, ping outside) and never had an issue. > I think we turned it off because the callback code was initially buggy and > nova would wait forever while things were in fact ok, but I'll change > "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run > another large test, just to confirm. > > We usually run these large tests after a version upgrade to test the APIs > under load. > > > > On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > wrote: > > On 5/17/2018 9:46 AM, George Mihaiescu wrote: > > and large rally tests of 500 instances complete with no issues. > > > Sure, except you can't ssh into the guests. > > The whole reason the vif plugging is fatal and timeout and callback code was > because the upstream CI was unstable without it. The server would report as > ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE > guest that you can't actually do anything with is kind of pointless. > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From zioproto at gmail.com Tue May 22 13:30:24 2018 From: zioproto at gmail.com (Saverio Proto) Date: Tue, 22 May 2018 15:30:24 +0200 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> Message-ID: Sorry email went out incomplete. Read this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ make sure that Openstack rootwrap configured to work in daemon mode Thank you Saverio 2018-05-22 15:29 GMT+02:00 Saverio Proto : > Hello Radu, > > do you have the Openstack rootwrap configured to work in daemon mode ? > > please read this article: > > 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology > : >> Hi, >> >> so, nova says the VM is ACTIVE and actually boots with no network. We are >> setting some metadata that we use later on and have cloud-init for different >> tasks. >> So, VM is up, OS is running, but network is working after a random amount of >> time, that can get to around 45 minutes. Thing is, is not happening to all >> VMs in that test (around 300), but it's happening to a fair amount - around >> 25%. >> >> I can see the callback coming few seconds after neutron openvswitch agent >> says it's completed the setup. My question is, why is it taking so long for >> nova openvswitch agent to configure the port? I can see the port up in both >> host OS and openvswitch. I would assume it's doing the whole namespace and >> iptables setup. But still, 30 minutes? Seems a lot! >> >> Thanks, >> Radu >> >> On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: >> >> We have other scheduled tests that perform end-to-end (assign floating IP, >> ssh, ping outside) and never had an issue. >> I think we turned it off because the callback code was initially buggy and >> nova would wait forever while things were in fact ok, but I'll change >> "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run >> another large test, just to confirm. >> >> We usually run these large tests after a version upgrade to test the APIs >> under load. >> >> >> >> On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann >> wrote: >> >> On 5/17/2018 9:46 AM, George Mihaiescu wrote: >> >> and large rally tests of 500 instances complete with no issues. >> >> >> Sure, except you can't ssh into the guests. >> >> The whole reason the vif plugging is fatal and timeout and callback code was >> because the upstream CI was unstable without it. The server would report as >> ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE >> guest that you can't actually do anything with is kind of pointless. >> >> _______________________________________________ >> >> OpenStack-operators mailing list >> >> OpenStack-operators at lists.openstack.org >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> From ignaziocassano at gmail.com Tue May 22 13:32:36 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 22 May 2018 15:32:36 +0200 Subject: [Openstack-operators] community vs founation membership Message-ID: Hi all, please, what's the difference between community and foundation membership ? Regards Ignazio -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue May 22 13:56:02 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 22 May 2018 13:56:02 +0000 Subject: [Openstack-operators] community vs founation membership In-Reply-To: References: Message-ID: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> On 2018-05-22 15:32:36 +0200 (+0200), Ignazio Cassano wrote: > please, what's the difference between community and foundation > membership ? The "community" setting is just a means of indicating that you have a profile/account for any of various purposes (scheduling, speaker submissions, et cetera) but are not officially an Individual Member of the OpenStack Foundation. A foundation membership is necessary for some official activities, particularly for participating in elections (board of directors, user committee, technical committee, project team lead) as either a candidate or voter. Joining the OpenStack Foundation as an Individual Member comes with no cost other than a minute or two of your time to provide contact information at https://www.openstack.org/join/ but does obligate you to at least vote in OpenStack Foundation Board of Directors elections once you are eligible to do so. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From Eric.Smith at ccur.com Tue May 22 13:57:47 2018 From: Eric.Smith at ccur.com (Smith, Eric) Date: Tue, 22 May 2018 13:57:47 +0000 Subject: [Openstack-operators] Multiple Ceph pools for Nova? In-Reply-To: <66b14191-cd3b-b455-3be9-80cf3629517e@gmail.com> References: <66b14191-cd3b-b455-3be9-80cf3629517e@gmail.com> Message-ID: Thanks everyone for the feedback - I have a pretty small environment (11 nodes) and I was able to find the compute / volume pool segregation within nova.conf / cinder.conf. I think I should be able to just export / import my existing RBDs from the spinning disk compute pool to the SSD compute pool and update my nova.conf. Then I'll add an extra backend in cinder.conf to point new volumes to the SSD volumes pool. Thanks for all the help again. Eric On 5/22/18, 12:53 AM, "Matt Riedemann" wrote: On 5/21/2018 11:51 AM, Smith, Eric wrote: > I have 2 Ceph pools, one backed by SSDs and one backed by spinning disks > (Separate roots within the CRUSH hierarchy). I’d like to run all > instances in a single project / tenant on SSDs and the rest on spinning > disks. How would I go about setting this up? As mentioned elsewhere, host aggregate would work for the compute hosts connected to each storage pool. Then you can have different flavors per aggregate and charge more for the SSD flavors or restrict the aggregates based on tenant [1]. Alternatively, if this is something you plan to eventually scale to a larger size, you could even separate the pools with separate cells and use resource provider aggregates in placement to mirror the host aggregates for tenant-per-cell filtering [2]. It sounds like this is very similar to what CERN does (cells per hardware characteristics and projects assigned to specific cells). So Belmiro could probably help give some guidance here too. Check out the talk he gave today at the summit [3]. [1] https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#aggregatemultitenancyisolation [2] https://docs.openstack.org/nova/latest/admin/configuration/schedulers.html#tenant-isolation-with-placement [3] https://www.openstack.org/videos/vancouver-2018/moving-from-cellsv1-to-cellsv2-at-cern -- Thanks, Matt _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From ignaziocassano at gmail.com Tue May 22 14:06:43 2018 From: ignaziocassano at gmail.com (Ignazio Cassano) Date: Tue, 22 May 2018 16:06:43 +0200 Subject: [Openstack-operators] community vs founation membership In-Reply-To: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> References: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> Message-ID: Hi Jeremy, thanks for your help. I am interested in openstack testing (no code contributing). Becoming community member give me any advantage ? At this time I am testing on ocata on centos 7. My environment is in HA with pacemaker (3 controllers) and 5 kvm nodes. Regards Ignazio 2018-05-22 15:56 GMT+02:00 Jeremy Stanley : > On 2018-05-22 15:32:36 +0200 (+0200), Ignazio Cassano wrote: > > please, what's the difference between community and foundation > > membership ? > > The "community" setting is just a means of indicating that you have > a profile/account for any of various purposes (scheduling, speaker > submissions, et cetera) but are not officially an Individual Member > of the OpenStack Foundation. A foundation membership is necessary > for some official activities, particularly for participating in > elections (board of directors, user committee, technical committee, > project team lead) as either a candidate or voter. Joining the > OpenStack Foundation as an Individual Member comes with no cost > other than a minute or two of your time to provide contact > information at https://www.openstack.org/join/ but does obligate you > to at least vote in OpenStack Foundation Board of Directors > elections once you are eligible to do so. > -- > Jeremy Stanley > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fungi at yuggoth.org Tue May 22 14:48:26 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 22 May 2018 14:48:26 +0000 Subject: [Openstack-operators] community vs founation membership In-Reply-To: References: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> Message-ID: <20180522144826.nhal6rbdxdoreaaa@yuggoth.org> On 2018-05-22 16:06:43 +0200 (+0200), Ignazio Cassano wrote: > I am interested in openstack testing (no code contributing). > Becoming community member give me any advantage ? > At this time I am testing on ocata on centos 7. > My environment is in HA with pacemaker (3 controllers) and 5 kvm nodes. If you wish to participate in the periodic OpenStack User Survey to provide details on your test deployment and any related experiences running OpenStack, then I think you'd need to create an account for https://www.openstack.org/ (so at least "community" level) to be able to do so. Participation in the survey is not mandatory, but still much appreciated as it helps the OpenStack project contributors better determine where improvements are most needed. -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jon at csail.mit.edu Tue May 22 15:02:34 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Tue, 22 May 2018 08:02:34 -0700 Subject: [Openstack-operators] community vs founation membership In-Reply-To: References: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> Message-ID: <20180522150234.fuwrqg2dzvwz33hy@csail.mit.edu> On Tue, May 22, 2018 at 04:06:43PM +0200, Ignazio Cassano wrote: : Hi Jeremy, thanks for your help. : I am interested in openstack testing (no code contributing). : Becoming community member give me any advantage ? : At this time I am testing on ocata on centos 7. : My environment is in HA with pacemaker (3 controllers) and 5 kvm nodes. Anyone can report bugs of course. If you became a foundation memeber you could also comment on proposed fixes during code review even if you're not contributing the code yourself. Code review is a valuable service to the community and something most projects are usually looking for more of if that is something you're comfortable with. Personally it's about where my python skills fall. I can read enough to see if a fix doesn't quite fix what I'm looking for or sometimes if it has adverse side effects for me. Again seeing these reviews is public but writing reviews requires foundation memebership in the same way being a code contributor does. If I recall the "community" level was meant mostly for people who could not sign the contributor agreement typically becase of employer policies. If that's a concer then become a "community" memeber if not join as a "foundation" member would be my advice. Welcome, -Jon From fungi at yuggoth.org Tue May 22 15:12:28 2018 From: fungi at yuggoth.org (Jeremy Stanley) Date: Tue, 22 May 2018 15:12:28 +0000 Subject: [Openstack-operators] community vs founation membership In-Reply-To: <20180522150234.fuwrqg2dzvwz33hy@csail.mit.edu> References: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> <20180522150234.fuwrqg2dzvwz33hy@csail.mit.edu> Message-ID: <20180522151228.qevzpobtpumldspq@yuggoth.org> On 2018-05-22 08:02:34 -0700 (-0700), Jonathan D. Proulx wrote: [...] > Again seeing these reviews is public but writing reviews requires > foundation memebership in the same way being a code contributor does. [...] In fact, commenting on https://review.openstack.org/ only requires creating an account at https://login.ubuntu.com/ (the OpenID service we're presently using for that) and logging in with it. Further, we dropped the need to become a member of the OpenStack Foundation in order to submit patches for review (it was never a legal requirement, but only a quirk of how we were previously linking accounts together to simplify technical elections). Contributing patches to most OpenStack projects does require agreeing to the OpenStack Individual Contributor License Agreement (ICLA) in Gerrit for now, but this is not the same thing as becoming an Individual Member of the OpenStack Foundation and doesn't even require any account on www.openstack.org for now, just login.ubuntu.com (this will likely change in the future when we eventually switch OpenID providers). -- Jeremy Stanley -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From jon at csail.mit.edu Tue May 22 15:26:50 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Tue, 22 May 2018 08:26:50 -0700 Subject: [Openstack-operators] community vs founation membership In-Reply-To: <20180522151228.qevzpobtpumldspq@yuggoth.org> References: <20180522135602.rotcqcy6pnx2sork@yuggoth.org> <20180522150234.fuwrqg2dzvwz33hy@csail.mit.edu> <20180522151228.qevzpobtpumldspq@yuggoth.org> Message-ID: <20180522152650.pyui54fh4m5amzct@csail.mit.edu> On Tue, May 22, 2018 at 03:12:28PM +0000, Jeremy Stanley wrote: :On 2018-05-22 08:02:34 -0700 (-0700), Jonathan D. Proulx wrote: :[...] :> Again seeing these reviews is public but writing reviews requires :> foundation memebership in the same way being a code contributor does. :[...] : :In fact, commenting on https://review.openstack.org/ only requires :creating an account at https://login.ubuntu.com/ (the OpenID service :we're presently using for that) and logging in with it. Further, we :dropped the need to become a member of the OpenStack Foundation in :order to submit patches for review (it was never a legal :requirement, but only a quirk of how we were previously linking :accounts together to simplify technical elections). Contributing :patches to most OpenStack projects does require agreeing to the :OpenStack Individual Contributor License Agreement (ICLA) in Gerrit :for now, but this is not the same thing as becoming an Individual :Member of the OpenStack Foundation and doesn't even require any :account on www.openstack.org for now, just login.ubuntu.com (this :will likely change in the future when we eventually switch OpenID :providers). Excellent! Glad to stand corrected, lower barriers are better barriers :) -Jon -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 963 bytes Desc: not available URL: From sundar.nadathur at intel.com Tue May 22 22:06:55 2018 From: sundar.nadathur at intel.com (Nadathur, Sundar) Date: Tue, 22 May 2018 15:06:55 -0700 Subject: [Openstack-operators] Followup to Cyborg/FPGA discussion at OpenStack Summit Message-ID: Hello operators,    We had a good discussion at the OpenStack Summit at Vancouver [1] on Cyborg/FPGA for Cloud/NFV. Cyborg [2] is the OpenStack project for life cycle management of accelerators, including GPUs and FPGAs. Thanks to those of you who attended. The discussion during the session has been captured in this etherpad [3]. Please feel free to respond on the etherpad. If you are interested in a follow-up discussion, please indicate in the same etherpad. [1]https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21720/cyborgfpga-support-for-cloudnfv [2] https://wiki.openstack.org/wiki/Cyborg [3] https://etherpad.openstack.org/p/Cyborg-FPGA-Support-for-Cloud-NFV Thanks. Regards, Sundar -------------- next part -------------- An HTML attachment was scrubbed... URL: From radu.popescu at emag.ro Wed May 23 10:08:18 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Wed, 23 May 2018 10:08:18 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> Message-ID: <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> Hi, actually, I didn't know about that option. I'll enable it right now. Testing is done every morning at about 4:00AM ..so I'll know tomorrow morning if it changed anything. Thanks, Radu On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote: Sorry email went out incomplete. Read this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ make sure that Openstack rootwrap configured to work in daemon mode Thank you Saverio 2018-05-22 15:29 GMT+02:00 Saverio Proto >: Hello Radu, do you have the Openstack rootwrap configured to work in daemon mode ? please read this article: 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology >: Hi, so, nova says the VM is ACTIVE and actually boots with no network. We are setting some metadata that we use later on and have cloud-init for different tasks. So, VM is up, OS is running, but network is working after a random amount of time, that can get to around 45 minutes. Thing is, is not happening to all VMs in that test (around 300), but it's happening to a fair amount - around 25%. I can see the callback coming few seconds after neutron openvswitch agent says it's completed the setup. My question is, why is it taking so long for nova openvswitch agent to configure the port? I can see the port up in both host OS and openvswitch. I would assume it's doing the whole namespace and iptables setup. But still, 30 minutes? Seems a lot! Thanks, Radu On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: We have other scheduled tests that perform end-to-end (assign floating IP, ssh, ping outside) and never had an issue. I think we turned it off because the callback code was initially buggy and nova would wait forever while things were in fact ok, but I'll change "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run another large test, just to confirm. We usually run these large tests after a version upgrade to test the APIs under load. On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > wrote: On 5/17/2018 9:46 AM, George Mihaiescu wrote: and large rally tests of 500 instances complete with no issues. Sure, except you can't ssh into the guests. The whole reason the vif plugging is fatal and timeout and callback code was because the upstream CI was unstable without it. The server would report as ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE guest that you can't actually do anything with is kind of pointless. _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at csail.mit.edu Wed May 23 20:41:29 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Wed, 23 May 2018 13:41:29 -0700 Subject: [Openstack-operators] OSA migrating existing deployment to OSA Message-ID: <20180523204129.nzexkf4vojqiohlb@csail.mit.edu> Hi All, Having attened Fast Forward upgrade session and Upgrade SIG this morning in Vancouver I'm convinced my upgrade problem is really a config management problem. I'm runnig an old deprecated config system that worked gread in 2012, but has aged poorly. I've been meaning to move to OSA for a very long time but with a small fragemented team and a generally working cloud something else alway grabs priority. This also seems a more common rut than I realized based on what I head this morning. I very much need to make my move over the next 3-4 months. If other people are are looking to make a similar migration to OSA (or have recently compleded one) I'd love to work toagther on documenting (if not codifying) mapping an existing cloud to OSA config. Obviously brownfield deployments are messy with site specific oddities all over the place, but If I'm going to suffer hopefully I can spare others some of that suffering. Anyone else crazy enough to get in this boat with me? -Jon From sagaray at nttdata.co.jp Wed May 23 21:34:09 2018 From: sagaray at nttdata.co.jp (sagaray at nttdata.co.jp) Date: Wed, 23 May 2018 21:34:09 +0000 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <9b1c9c3d-00dc-d073-96e7-4d6409521261@gmail.com> References: <1525919628734.2105@nttdata.co.jp> <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> <1526374144863.89140@nttdata.co.jp>, <9b1c9c3d-00dc-d073-96e7-4d6409521261@gmail.com> Message-ID: <3231666a74104fae802f064bfa8ce88f@MP-MSGSS-MBX017.msg.nttdata.co.jp> Hi Matt, > > We store the service logs which are created by VM on that storage. > > I don't mean to be glib, but have you considered maybe not doing that? The load issue on storage is due to the way we deploy our business softwares on VM. The best way is introducing a new storage and separate the SAN, but we cannot change our deployment method due to it's cost and other limitations. On a long-term, our operation team will change the deployment method to better one to resolve this problem. On the other hand, we would like to build a tool to support VM migration that is unaware of which migration method is used for VM migration (Cold or Live). Feature parity wise, if live migration supports cancel feature, then we think that cold migration must support it as well. -------------------------------------------------- Yukinori Sagara Platform Engineering Department, NTT DATA Corp. ________________________________________ 差出人: Matt Riedemann 送信日時: 2018年5月18日 1:39 宛先: openstack-operators at lists.openstack.org 件名: Re: [Openstack-operators] Need feedback for nova aborting cold migration function On 5/15/2018 3:48 AM, sagaray at nttdata.co.jp wrote: > We store the service logs which are created by VM on that storage. I don't mean to be glib, but have you considered maybe not doing that? -- Thanks, Matt _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From gael.therond at gmail.com Wed May 23 21:59:37 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Wed, 23 May 2018 23:59:37 +0200 Subject: [Openstack-operators] Need feedback for nova aborting cold migration function In-Reply-To: <3231666a74104fae802f064bfa8ce88f@MP-MSGSS-MBX017.msg.nttdata.co.jp> References: <1525919628734.2105@nttdata.co.jp> <5fea9373-021a-0a2e-ba91-d7fe62bd5ca9@gmail.com> <1526374144863.89140@nttdata.co.jp> <9b1c9c3d-00dc-d073-96e7-4d6409521261@gmail.com> <3231666a74104fae802f064bfa8ce88f@MP-MSGSS-MBX017.msg.nttdata.co.jp> Message-ID: We are using multiple storage backend / topology on our side ranging from ScaleIO to CEPH passing by local compute host storage (were we need cold storage) and VNX, I have to said that CEPH is our best bet. Since we use it we clearly reduced our outages, allowed our user advanced features such as live-migration, boot from volumes and on top of that a better and more reliable performance. Yet we still need to get live and cold migration the same features set as our users/customers are really expecting us to provide a seamless experience between options. I can’t really speak out about real numbers but I’m within the video game industry if that help to drive support and traction/interest. Thanks for the survey btw. Kind regards, Gaël. Le mer. 23 mai 2018 à 23:36, a écrit : > Hi Matt, > > > > We store the service logs which are created by VM on that storage. > > > > I don't mean to be glib, but have you considered maybe not doing that? > > The load issue on storage is due to the way we deploy our business > softwares on VM. > The best way is introducing a new storage and separate the SAN, but we > cannot change our deployment method due to it's cost and other limitations. > On a long-term, our operation team will change the deployment method to > better one to resolve this problem. > > On the other hand, we would like to build a tool to support VM migration > that is unaware of which migration method is used for VM migration (Cold or > Live). Feature parity wise, if live migration supports cancel feature, then > we think that cold migration must support it as well. > > -------------------------------------------------- > Yukinori Sagara > Platform Engineering Department, NTT DATA Corp. > > ________________________________________ > 差出人: Matt Riedemann > 送信日時: 2018年5月18日 1:39 > 宛先: openstack-operators at lists.openstack.org > 件名: Re: [Openstack-operators] Need feedback for nova aborting cold > migration function > > On 5/15/2018 3:48 AM, sagaray at nttdata.co.jp wrote: > > We store the service logs which are created by VM on that storage. > > I don't mean to be glib, but have you considered maybe not doing that? > > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From openstack at medberry.net Wed May 23 22:09:40 2018 From: openstack at medberry.net (David Medberry) Date: Wed, 23 May 2018 15:09:40 -0700 Subject: [Openstack-operators] Fwd: Follow Up: Private Enterprise Cloud Issues In-Reply-To: References: Message-ID: There was a great turnout at the Private Enterprise Cloud Issues session here in Vancouver. I'll propose a follow-on discussion for Denver PTG as well as trying to sift the data a bit and pre-populate. Look for that sifted data soon. For folks unable to participate locally, the etherpad is here: https://etherpad.openstack.org/p/YVR-private-enterprise-cloud-issues (and I've cached a copy offline in case it gets reset/etc.) -- -dave -------------- next part -------------- An HTML attachment was scrubbed... URL: From ekcs.openstack at gmail.com Wed May 23 23:39:16 2018 From: ekcs.openstack at gmail.com (Eric K) Date: Wed, 23 May 2018 16:39:16 -0700 Subject: [Openstack-operators] [self-healing] BoF in Vancouver tomorrow Message-ID: For everyone interested in self-healing infra, come share your experience and your ideas with like-minded stackers, including folks from 10+ projects working together to make OpenStack self-healing a reality! Thursday, May 24, 1:50pm-2:30pm Vancouver Convention Centre West - Level Two - Room 217 https://www.openstack.org/summit/vancouver-2018/summit-schedule/events/21830/self-healing-sig-bof Brainstorming etherpad: https://etherpad.openstack.org/p/YVR-self-healing-brainstorming From mihalis68 at gmail.com Thu May 24 01:38:32 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Wed, 23 May 2018 18:38:32 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point Message-ID: Hello Everyone, In the Ops Community documentation working session today in Vancouver, we made some really good progress (etherpad here: https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of the good stuff is yet written down). In short, we're going to course correct on maintaining the Operators Guide, the HA Guide and Architecture Guide, not edit-in-place via the wiki and instead try still maintaining them as code, but with a different, new set of owners, possibly in a new Ops-focused repo. There was a strong consensus that a) code workflow >> wiki workflow and that b) openstack core docs tools are just fine. There is a lot still to be decided on how where and when, but we do have an offer of a rewrite of the HA Guide, as long as the changes will be allowed to actually land, so we expect to actually start showing some progress. At the end of the session, people wanted to know how to follow along as various people work out how to do this... and so for now that place is this very email thread. The idea is if the code for those documents goes to live in a different repo, or if new contributors turn up, or if a new version we will announce/discuss it here until such time as we have a better home for this initiative. Cheers Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at csail.mit.edu Thu May 24 01:46:26 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Wed, 23 May 2018 18:46:26 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: Message-ID: <20180524014626.2e3n7kmjxdjb7rjv@csail.mit.edu> Thanks for kicking this off Chris. Were you going to create that new repository? If not I can take on the tasks of learning how and making it happen. -Jon On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: : Hello Everyone, : In the Ops Community documentation working session today in Vancouver, : we made some really good progress (etherpad : here: [1]https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but : not all of the good stuff is yet written down). : In short, we're going to course correct on maintaining the Operators : Guide, the HA Guide and Architecture Guide, not edit-in-place via the : wiki and instead try still maintaining them as code, but with a : different, new set of owners, possibly in a new Ops-focused repo. There : was a strong consensus that a) code workflow >> wiki workflow and that : b) openstack core docs tools are just fine. : There is a lot still to be decided on how where and when, but we do : have an offer of a rewrite of the HA Guide, as long as the changes will : be allowed to actually land, so we expect to actually start showing : some progress. : At the end of the session, people wanted to know how to follow along as : various people work out how to do this... and so for now that place is : this very email thread. The idea is if the code for those documents : goes to live in a different repo, or if new contributors turn up, or if : a new version we will announce/discuss it here until such time as we : have a better home for this initiative. : Cheers : Chris : -- : Chris Morgan <[2]mihalis68 at gmail.com> : :References : : 1. https://etherpad.openstack.org/p/YVR-Ops-Community-Docs : 2. mailto:mihalis68 at gmail.com :_______________________________________________ :OpenStack-operators mailing list :OpenStack-operators at lists.openstack.org :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mihalis68 at gmail.com Thu May 24 03:09:05 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Wed, 23 May 2018 20:09:05 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180524014626.2e3n7kmjxdjb7rjv@csail.mit.edu> References: <20180524014626.2e3n7kmjxdjb7rjv@csail.mit.edu> Message-ID: <9187FEF2-6403-42F7-87AC-E160E4529688@gmail.com> I hadn’t got that far in my thoughts. If you’re able to give that a go, then that would be great! Chris Sent from my iPhone > On May 23, 2018, at 6:46 PM, Jonathan D. Proulx wrote: > > > Thanks for kicking this off Chris. > > Were you going to create that new repository? If not I can take on > the tasks of learning how and making it happen. > > -Jon > > On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: > : Hello Everyone, > : In the Ops Community documentation working session today in Vancouver, > : we made some really good progress (etherpad > : here: [1]https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but > : not all of the good stuff is yet written down). > : In short, we're going to course correct on maintaining the Operators > : Guide, the HA Guide and Architecture Guide, not edit-in-place via the > : wiki and instead try still maintaining them as code, but with a > : different, new set of owners, possibly in a new Ops-focused repo. There > : was a strong consensus that a) code workflow >> wiki workflow and that > : b) openstack core docs tools are just fine. > : There is a lot still to be decided on how where and when, but we do > : have an offer of a rewrite of the HA Guide, as long as the changes will > : be allowed to actually land, so we expect to actually start showing > : some progress. > : At the end of the session, people wanted to know how to follow along as > : various people work out how to do this... and so for now that place is > : this very email thread. The idea is if the code for those documents > : goes to live in a different repo, or if new contributors turn up, or if > : a new version we will announce/discuss it here until such time as we > : have a better home for this initiative. > : Cheers > : Chris > : -- > : Chris Morgan <[2]mihalis68 at gmail.com> > : > :References > : > : 1. https://etherpad.openstack.org/p/YVR-Ops-Community-Docs > : 2. mailto:mihalis68 at gmail.com > > :_______________________________________________ > :OpenStack-operators mailing list > :OpenStack-operators at lists.openstack.org > :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From doug at doughellmann.com Thu May 24 04:23:53 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Wed, 23 May 2018 21:23:53 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <9187FEF2-6403-42F7-87AC-E160E4529688@gmail.com> References: <20180524014626.2e3n7kmjxdjb7rjv@csail.mit.edu> <9187FEF2-6403-42F7-87AC-E160E4529688@gmail.com> Message-ID: <1527135398-sup-4094@lrrr.local> You will want to follow the steps of the "project creators' guide" [1]. Not all of them apply, because this is a docs repo and not a code project repo. Let me know if you have questions about which pieces do or do not apply as you go along, and we can work on improving that document as well. The openstack/tripleo-docs repo looks like it has a setup similar to the one you'll be creating, so when you get to the steps about setting up jobs you can probably copy what they have. After the session today it occurred to me that there is one governance-related thing that we would need to do in order to publish this content to docs.openstack.org. Right now we have a policy that only official teams can do that. I think if the guide is owned by a SIG or other group chartered either by the TC or UC we can make that work. We can do quite a lot of the setup work while we figure that out, though, so don't lose momentum in the mean time. Doug [1] https://docs.openstack.org/infra/manual/creators.html Excerpts from Chris Morgan's message of 2018-05-23 20:09:05 -0700: > I hadn’t got that far in my thoughts. If you’re able to give that a go, then that would be great! > > Chris > > Sent from my iPhone > > > On May 23, 2018, at 6:46 PM, Jonathan D. Proulx wrote: > > > > > > Thanks for kicking this off Chris. > > > > Were you going to create that new repository? If not I can take on > > the tasks of learning how and making it happen. > > > > -Jon > > > > On Wed, May 23, 2018 at 06:38:32PM -0700, Chris Morgan wrote: > > : Hello Everyone, > > : In the Ops Community documentation working session today in Vancouver, > > : we made some really good progress (etherpad > > : here: [1]https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but > > : not all of the good stuff is yet written down). > > : In short, we're going to course correct on maintaining the Operators > > : Guide, the HA Guide and Architecture Guide, not edit-in-place via the > > : wiki and instead try still maintaining them as code, but with a > > : different, new set of owners, possibly in a new Ops-focused repo. There > > : was a strong consensus that a) code workflow >> wiki workflow and that > > : b) openstack core docs tools are just fine. > > : There is a lot still to be decided on how where and when, but we do > > : have an offer of a rewrite of the HA Guide, as long as the changes will > > : be allowed to actually land, so we expect to actually start showing > > : some progress. > > : At the end of the session, people wanted to know how to follow along as > > : various people work out how to do this... and so for now that place is > > : this very email thread. The idea is if the code for those documents > > : goes to live in a different repo, or if new contributors turn up, or if > > : a new version we will announce/discuss it here until such time as we > > : have a better home for this initiative. > > : Cheers > > : Chris > > : -- > > : Chris Morgan <[2]mihalis68 at gmail.com> > > : > > :References > > : > > : 1. https://etherpad.openstack.org/p/YVR-Ops-Community-Docs > > : 2. mailto:mihalis68 at gmail.com > > > > :_______________________________________________ > > :OpenStack-operators mailing list > > :OpenStack-operators at lists.openstack.org > > :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > From eumel at arcor.de Thu May 24 04:56:47 2018 From: eumel at arcor.de (Frank Kloeker) Date: Thu, 24 May 2018 06:56:47 +0200 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: Message-ID: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> Hi Chris, thanks for summarize our session today in Vancouver. As I18n PTL and one of the Docs Core I put Petr in Cc. He is currently Docs PTL, but unfortunatelly not on-site. I couldn't also not get the full history of the story and that's also not the idea to starting finger pointing. As usualy we moving forward and there are some interesting things to know what happened. First of all: There are no "Docs-Team" anymore. If you look at [1] there are mostly part-time contributors like me or people are more involved in other projects and therefore busy. Because of that, the responsibility of documentation content are moved completely to the project teams. Each repo has a user guide, admin guide, deployment guide, and so on. The small Documentation Team provides only tooling and give advices how to write and publish a document. So it's up to you to re-use the old repo on [2] or setup a new one. I would recommend to use the best of both worlds. There are a very good toolset in place for testing and publishing documents. There are also various text editors for rst extensions available, like in vim, notepad++ or also online services. I understand the concerns and when people are sad because their patches are ignored for months. But it's alltime a question of responsibilty and how can spend people time. I would be available for help. As I18n PTL I could imagine that a OpenStack Operations Guide is available in different languages and portable in different formats like in Sphinx. For us as translation team it's a good possibility to get feedback about the quality and to understand the requirements, also for other documents. So let's move on. kind regards Frank [1] https://review.openstack.org/#/admin/groups/30,members [2] https://github.com/openstack/operations-guide Am 2018-05-24 03:38, schrieb Chris Morgan: > Hello Everyone, > > In the Ops Community documentation working session today in Vancouver, > we made some really good progress (etherpad here: > https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of > the good stuff is yet written down). > > In short, we're going to course correct on maintaining the Operators > Guide, the HA Guide and Architecture Guide, not edit-in-place via the > wiki and instead try still maintaining them as code, but with a > different, new set of owners, possibly in a new Ops-focused repo. > There was a strong consensus that a) code workflow >> wiki workflow > and that b) openstack core docs tools are just fine. > > There is a lot still to be decided on how where and when, but we do > have an offer of a rewrite of the HA Guide, as long as the changes > will be allowed to actually land, so we expect to actually start > showing some progress. > > At the end of the session, people wanted to know how to follow along > as various people work out how to do this... and so for now that place > is this very email thread. The idea is if the code for those documents > goes to live in a different repo, or if new contributors turn up, or > if a new version we will announce/discuss it here until such time as > we have a better home for this initiative. > > Cheers > > Chris > > -- > Chris Morgan > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mrhillsman at gmail.com Thu May 24 05:26:02 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Wed, 23 May 2018 22:26:02 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> Message-ID: Great to see this moving. I have some questions/concerns based on your statement Doug about docs.openstack.org publishing and do not want to detour the conversation but ask for feedback. Currently there are a number of repositories under osops- https://github.com/openstack-infra/project-config/blob/master/gerrit/projects.yaml#L5673-L5703 Generally active: osops-tools-contrib osops-tools-generic osops-tools-monitoring Probably dead: osops-tools-logging osops-coda osops-example-configs Because you are more familiar with how things work, is there a way to consolidate these vs coming up with another repo like osops-docs or whatever in this case? And second, is there already governance clearance to publish based on the following - https://launchpad.net/osops - which is where these repos originated. On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: > Hi Chris, > > thanks for summarize our session today in Vancouver. As I18n PTL and one > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but > unfortunatelly not on-site. > I couldn't also not get the full history of the story and that's also not > the idea to starting finger pointing. As usualy we moving forward and there > are some interesting things to know what happened. > First of all: There are no "Docs-Team" anymore. If you look at [1] there > are mostly part-time contributors like me or people are more involved in > other projects and therefore busy. Because of that, the responsibility of > documentation content are moved completely to the project teams. Each repo > has a user guide, admin guide, deployment guide, and so on. The small > Documentation Team provides only tooling and give advices how to write and > publish a document. So it's up to you to re-use the old repo on [2] or > setup a new one. I would recommend to use the best of both worlds. There > are a very good toolset in place for testing and publishing documents. > There are also various text editors for rst extensions available, like in > vim, notepad++ or also online services. I understand the concerns and when > people are sad because their patches are ignored for months. But it's > alltime a question of responsibilty and how can spend people time. > I would be available for help. As I18n PTL I could imagine that a > OpenStack Operations Guide is available in different languages and portable > in different formats like in Sphinx. For us as translation team it's a good > possibility to get feedback about the quality and to understand the > requirements, also for other documents. > So let's move on. > > kind regards > > Frank > > [1] https://review.openstack.org/#/admin/groups/30,members > [2] https://github.com/openstack/operations-guide > > > Am 2018-05-24 03:38, schrieb Chris Morgan: > >> Hello Everyone, >> >> In the Ops Community documentation working session today in Vancouver, >> we made some really good progress (etherpad here: >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of >> the good stuff is yet written down). >> >> In short, we're going to course correct on maintaining the Operators >> Guide, the HA Guide and Architecture Guide, not edit-in-place via the >> wiki and instead try still maintaining them as code, but with a >> different, new set of owners, possibly in a new Ops-focused repo. >> There was a strong consensus that a) code workflow >> wiki workflow >> and that b) openstack core docs tools are just fine. >> >> There is a lot still to be decided on how where and when, but we do >> have an offer of a rewrite of the HA Guide, as long as the changes >> will be allowed to actually land, so we expect to actually start >> showing some progress. >> >> At the end of the session, people wanted to know how to follow along >> as various people work out how to do this... and so for now that place >> is this very email thread. The idea is if the code for those documents >> goes to live in a different repo, or if new contributors turn up, or >> if a new version we will announce/discuss it here until such time as >> we have a better home for this initiative. >> >> Cheers >> >> Chris >> >> -- >> Chris Morgan >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Thu May 24 05:28:26 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Wed, 23 May 2018 22:28:26 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> Message-ID: Also, apologies, if consolidation or reorganizing all these is reasonable, what do you think that would look like; i.e. osops > tools >> contrib >> generic >> monitoring >> logging > docs > example-configs On Wed, May 23, 2018 at 10:26 PM, Melvin Hillsman wrote: > Great to see this moving. I have some questions/concerns based on your > statement Doug about docs.openstack.org publishing and do not want to > detour the conversation but ask for feedback. Currently there are a number > of repositories under osops- > > https://github.com/openstack-infra/project-config/blob/ > master/gerrit/projects.yaml#L5673-L5703 > > Generally active: > osops-tools-contrib > osops-tools-generic > osops-tools-monitoring > > > Probably dead: > osops-tools-logging > osops-coda > osops-example-configs > > Because you are more familiar with how things work, is there a way to > consolidate these vs coming up with another repo like osops-docs or > whatever in this case? And second, is there already governance clearance to > publish based on the following - https://launchpad.net/osops - which is > where these repos originated. > > > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: > >> Hi Chris, >> >> thanks for summarize our session today in Vancouver. As I18n PTL and one >> of the Docs Core I put Petr in Cc. He is currently Docs PTL, but >> unfortunatelly not on-site. >> I couldn't also not get the full history of the story and that's also not >> the idea to starting finger pointing. As usualy we moving forward and there >> are some interesting things to know what happened. >> First of all: There are no "Docs-Team" anymore. If you look at [1] there >> are mostly part-time contributors like me or people are more involved in >> other projects and therefore busy. Because of that, the responsibility of >> documentation content are moved completely to the project teams. Each repo >> has a user guide, admin guide, deployment guide, and so on. The small >> Documentation Team provides only tooling and give advices how to write and >> publish a document. So it's up to you to re-use the old repo on [2] or >> setup a new one. I would recommend to use the best of both worlds. There >> are a very good toolset in place for testing and publishing documents. >> There are also various text editors for rst extensions available, like in >> vim, notepad++ or also online services. I understand the concerns and when >> people are sad because their patches are ignored for months. But it's >> alltime a question of responsibilty and how can spend people time. >> I would be available for help. As I18n PTL I could imagine that a >> OpenStack Operations Guide is available in different languages and portable >> in different formats like in Sphinx. For us as translation team it's a good >> possibility to get feedback about the quality and to understand the >> requirements, also for other documents. >> So let's move on. >> >> kind regards >> >> Frank >> >> [1] https://review.openstack.org/#/admin/groups/30,members >> [2] https://github.com/openstack/operations-guide >> >> >> Am 2018-05-24 03:38, schrieb Chris Morgan: >> >>> Hello Everyone, >>> >>> In the Ops Community documentation working session today in Vancouver, >>> we made some really good progress (etherpad here: >>> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of >>> the good stuff is yet written down). >>> >>> In short, we're going to course correct on maintaining the Operators >>> Guide, the HA Guide and Architecture Guide, not edit-in-place via the >>> wiki and instead try still maintaining them as code, but with a >>> different, new set of owners, possibly in a new Ops-focused repo. >>> There was a strong consensus that a) code workflow >> wiki workflow >>> and that b) openstack core docs tools are just fine. >>> >>> There is a lot still to be decided on how where and when, but we do >>> have an offer of a rewrite of the HA Guide, as long as the changes >>> will be allowed to actually land, so we expect to actually start >>> showing some progress. >>> >>> At the end of the session, people wanted to know how to follow along >>> as various people work out how to do this... and so for now that place >>> is this very email thread. The idea is if the code for those documents >>> goes to live in a different repo, or if new contributors turn up, or >>> if a new version we will announce/discuss it here until such time as >>> we have a better home for this initiative. >>> >>> Cheers >>> >>> Chris >>> >>> -- >>> Chris Morgan >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >> >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> > > > > -- > Kind regards, > > Melvin Hillsman > mrhillsman at gmail.com > mobile: (832) 264-2646 > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Thu May 24 05:58:40 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Wed, 23 May 2018 22:58:40 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> Message-ID: <1527141275-sup-1922@lrrr.local> Excerpts from Melvin Hillsman's message of 2018-05-23 22:26:02 -0700: > Great to see this moving. I have some questions/concerns based on your > statement Doug about docs.openstack.org publishing and do not want to > detour the conversation but ask for feedback. Currently there are a number I'm just unclear on that, but don't consider it a blocker. We will sort out whatever governance or policy change is needed to let this move forward. > of repositories under osops- > > https://github.com/openstack-infra/project-config/blob/master/gerrit/projects.yaml#L5673-L5703 > > Generally active: > osops-tools-contrib > osops-tools-generic > osops-tools-monitoring > > > Probably dead: > osops-tools-logging > osops-coda > osops-example-configs > > Because you are more familiar with how things work, is there a way to > consolidate these vs coming up with another repo like osops-docs or > whatever in this case? And second, is there already governance clearance to > publish based on the following - https://launchpad.net/osops - which is > where these repos originated. I don't really know what any of those things are, or whether it makes sense to put this new content there. I assumed we would make a repo with a name like "operations-guide", but that's up to Chris and John. If they think reusing an existing repository makes sense, that would be OK with me, but it's cheap and easy to set up a new one, too. My main concern is that we remove the road blocks, now that we have people interested in contributing to this documentation. > > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: > > > Hi Chris, > > > > thanks for summarize our session today in Vancouver. As I18n PTL and one > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but > > unfortunatelly not on-site. > > I couldn't also not get the full history of the story and that's also not > > the idea to starting finger pointing. As usualy we moving forward and there > > are some interesting things to know what happened. > > First of all: There are no "Docs-Team" anymore. If you look at [1] there > > are mostly part-time contributors like me or people are more involved in > > other projects and therefore busy. Because of that, the responsibility of > > documentation content are moved completely to the project teams. Each repo > > has a user guide, admin guide, deployment guide, and so on. The small > > Documentation Team provides only tooling and give advices how to write and > > publish a document. So it's up to you to re-use the old repo on [2] or > > setup a new one. I would recommend to use the best of both worlds. There > > are a very good toolset in place for testing and publishing documents. > > There are also various text editors for rst extensions available, like in > > vim, notepad++ or also online services. I understand the concerns and when > > people are sad because their patches are ignored for months. But it's > > alltime a question of responsibilty and how can spend people time. > > I would be available for help. As I18n PTL I could imagine that a > > OpenStack Operations Guide is available in different languages and portable > > in different formats like in Sphinx. For us as translation team it's a good > > possibility to get feedback about the quality and to understand the > > requirements, also for other documents. > > So let's move on. > > > > kind regards > > > > Frank > > > > [1] https://review.openstack.org/#/admin/groups/30,members > > [2] https://github.com/openstack/operations-guide > > > > > > Am 2018-05-24 03:38, schrieb Chris Morgan: > > > >> Hello Everyone, > >> > >> In the Ops Community documentation working session today in Vancouver, > >> we made some really good progress (etherpad here: > >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of > >> the good stuff is yet written down). > >> > >> In short, we're going to course correct on maintaining the Operators > >> Guide, the HA Guide and Architecture Guide, not edit-in-place via the > >> wiki and instead try still maintaining them as code, but with a > >> different, new set of owners, possibly in a new Ops-focused repo. > >> There was a strong consensus that a) code workflow >> wiki workflow > >> and that b) openstack core docs tools are just fine. > >> > >> There is a lot still to be decided on how where and when, but we do > >> have an offer of a rewrite of the HA Guide, as long as the changes > >> will be allowed to actually land, so we expect to actually start > >> showing some progress. > >> > >> At the end of the session, people wanted to know how to follow along > >> as various people work out how to do this... and so for now that place > >> is this very email thread. The idea is if the code for those documents > >> goes to live in a different repo, or if new contributors turn up, or > >> if a new version we will announce/discuss it here until such time as > >> we have a better home for this initiative. > >> > >> Cheers > >> > >> Chris > >> > >> -- > >> Chris Morgan > >> _______________________________________________ > >> OpenStack-operators mailing list > >> OpenStack-operators at lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > From mrhillsman at gmail.com Thu May 24 06:31:03 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Wed, 23 May 2018 23:31:03 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <1527141275-sup-1922@lrrr.local> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <1527141275-sup-1922@lrrr.local> Message-ID: Sure definitely, that's why I said I was not trying to detour the conversation, but rather asking for feedback. Definitely agree things should continue to plow forward and Chris has been doing an excellent job here and I think it is awesome that he is continuing to push this. On Wed, May 23, 2018 at 10:58 PM, Doug Hellmann wrote: > Excerpts from Melvin Hillsman's message of 2018-05-23 22:26:02 -0700: > > Great to see this moving. I have some questions/concerns based on your > > statement Doug about docs.openstack.org publishing and do not want to > > detour the conversation but ask for feedback. Currently there are a > number > > I'm just unclear on that, but don't consider it a blocker. We will sort > out whatever governance or policy change is needed to let this move > forward. > > > of repositories under osops- > > > > https://github.com/openstack-infra/project-config/blob/ > master/gerrit/projects.yaml#L5673-L5703 > > > > Generally active: > > osops-tools-contrib > > osops-tools-generic > > osops-tools-monitoring > > > > > > Probably dead: > > osops-tools-logging > > osops-coda > > osops-example-configs > > > > Because you are more familiar with how things work, is there a way to > > consolidate these vs coming up with another repo like osops-docs or > > whatever in this case? And second, is there already governance clearance > to > > publish based on the following - https://launchpad.net/osops - which is > > where these repos originated. > > I don't really know what any of those things are, or whether it > makes sense to put this new content there. I assumed we would make > a repo with a name like "operations-guide", but that's up to Chris > and John. If they think reusing an existing repository makes sense, > that would be OK with me, but it's cheap and easy to set up a new > one, too. > > My main concern is that we remove the road blocks, now that we have > people interested in contributing to this documentation. > > > > > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: > > > > > Hi Chris, > > > > > > thanks for summarize our session today in Vancouver. As I18n PTL and > one > > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but > > > unfortunatelly not on-site. > > > I couldn't also not get the full history of the story and that's also > not > > > the idea to starting finger pointing. As usualy we moving forward and > there > > > are some interesting things to know what happened. > > > First of all: There are no "Docs-Team" anymore. If you look at [1] > there > > > are mostly part-time contributors like me or people are more involved > in > > > other projects and therefore busy. Because of that, the responsibility > of > > > documentation content are moved completely to the project teams. Each > repo > > > has a user guide, admin guide, deployment guide, and so on. The small > > > Documentation Team provides only tooling and give advices how to write > and > > > publish a document. So it's up to you to re-use the old repo on [2] or > > > setup a new one. I would recommend to use the best of both worlds. > There > > > are a very good toolset in place for testing and publishing documents. > > > There are also various text editors for rst extensions available, like > in > > > vim, notepad++ or also online services. I understand the concerns and > when > > > people are sad because their patches are ignored for months. But it's > > > alltime a question of responsibilty and how can spend people time. > > > I would be available for help. As I18n PTL I could imagine that a > > > OpenStack Operations Guide is available in different languages and > portable > > > in different formats like in Sphinx. For us as translation team it's a > good > > > possibility to get feedback about the quality and to understand the > > > requirements, also for other documents. > > > So let's move on. > > > > > > kind regards > > > > > > Frank > > > > > > [1] https://review.openstack.org/#/admin/groups/30,members > > > [2] https://github.com/openstack/operations-guide > > > > > > > > > Am 2018-05-24 03:38, schrieb Chris Morgan: > > > > > >> Hello Everyone, > > >> > > >> In the Ops Community documentation working session today in Vancouver, > > >> we made some really good progress (etherpad here: > > >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all > of > > >> the good stuff is yet written down). > > >> > > >> In short, we're going to course correct on maintaining the Operators > > >> Guide, the HA Guide and Architecture Guide, not edit-in-place via the > > >> wiki and instead try still maintaining them as code, but with a > > >> different, new set of owners, possibly in a new Ops-focused repo. > > >> There was a strong consensus that a) code workflow >> wiki workflow > > >> and that b) openstack core docs tools are just fine. > > >> > > >> There is a lot still to be decided on how where and when, but we do > > >> have an offer of a rewrite of the HA Guide, as long as the changes > > >> will be allowed to actually land, so we expect to actually start > > >> showing some progress. > > >> > > >> At the end of the session, people wanted to know how to follow along > > >> as various people work out how to do this... and so for now that place > > >> is this very email thread. The idea is if the code for those documents > > >> goes to live in a different repo, or if new contributors turn up, or > > >> if a new version we will announce/discuss it here until such time as > > >> we have a better home for this initiative. > > >> > > >> Cheers > > >> > > >> Chris > > >> > > >> -- > > >> Chris Morgan > > >> _______________________________________________ > > >> OpenStack-operators mailing list > > >> OpenStack-operators at lists.openstack.org > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > > >> > > > > > > > > > _______________________________________________ > > > OpenStack-operators mailing list > > > OpenStack-operators at lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > > > > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From radu.popescu at emag.ro Thu May 24 09:07:20 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Thu, 24 May 2018 09:07:20 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> Message-ID: <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> Hi, did the change yesterday. Had no issue this morning with neutron not being able to move fast enough. Still, we had some storage issues, but that's another thing. Anyway, I'll leave it like this for the next few days and report back in case I get the same slow neutron errors. Thanks a lot! Radu On Wed, 2018-05-23 at 10:08 +0000, Radu Popescu | eMAG, Technology wrote: Hi, actually, I didn't know about that option. I'll enable it right now. Testing is done every morning at about 4:00AM ..so I'll know tomorrow morning if it changed anything. Thanks, Radu On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote: Sorry email went out incomplete. Read this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ make sure that Openstack rootwrap configured to work in daemon mode Thank you Saverio 2018-05-22 15:29 GMT+02:00 Saverio Proto >: Hello Radu, do you have the Openstack rootwrap configured to work in daemon mode ? please read this article: 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology >: Hi, so, nova says the VM is ACTIVE and actually boots with no network. We are setting some metadata that we use later on and have cloud-init for different tasks. So, VM is up, OS is running, but network is working after a random amount of time, that can get to around 45 minutes. Thing is, is not happening to all VMs in that test (around 300), but it's happening to a fair amount - around 25%. I can see the callback coming few seconds after neutron openvswitch agent says it's completed the setup. My question is, why is it taking so long for nova openvswitch agent to configure the port? I can see the port up in both host OS and openvswitch. I would assume it's doing the whole namespace and iptables setup. But still, 30 minutes? Seems a lot! Thanks, Radu On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: We have other scheduled tests that perform end-to-end (assign floating IP, ssh, ping outside) and never had an issue. I think we turned it off because the callback code was initially buggy and nova would wait forever while things were in fact ok, but I'll change "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run another large test, just to confirm. We usually run these large tests after a version upgrade to test the APIs under load. On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > wrote: On 5/17/2018 9:46 AM, George Mihaiescu wrote: and large rally tests of 500 instances complete with no issues. Sure, except you can't ssh into the guests. The whole reason the vif plugging is fatal and timeout and callback code was because the upstream CI was unstable without it. The server would report as ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE guest that you can't actually do anything with is kind of pointless. _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From zioproto at gmail.com Thu May 24 09:51:10 2018 From: zioproto at gmail.com (Saverio Proto) Date: Thu, 24 May 2018 11:51:10 +0200 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> Message-ID: Glad to hear it! Always monitor rabbitmq queues to identify bottlenecks !! :) Cheers Saverio Il gio 24 mag 2018, 11:07 Radu Popescu | eMAG, Technology < radu.popescu at emag.ro> ha scritto: > Hi, > > did the change yesterday. Had no issue this morning with neutron not being > able to move fast enough. Still, we had some storage issues, but that's > another thing. > Anyway, I'll leave it like this for the next few days and report back in > case I get the same slow neutron errors. > > Thanks a lot! > Radu > > On Wed, 2018-05-23 at 10:08 +0000, Radu Popescu | eMAG, Technology wrote: > > Hi, > > actually, I didn't know about that option. I'll enable it right now. > Testing is done every morning at about 4:00AM ..so I'll know tomorrow > morning if it changed anything. > > Thanks, > Radu > > On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote: > > Sorry email went out incomplete. > > Read this: > > https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ > > > make sure that Openstack rootwrap configured to work in daemon mode > > > Thank you > > > Saverio > > > > 2018-05-22 15:29 GMT+02:00 Saverio Proto : > > Hello Radu, > > > do you have the Openstack rootwrap configured to work in daemon mode ? > > > please read this article: > > > 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology > > : > > Hi, > > > so, nova says the VM is ACTIVE and actually boots with no network. We are > > setting some metadata that we use later on and have cloud-init for different > > tasks. > > So, VM is up, OS is running, but network is working after a random amount of > > time, that can get to around 45 minutes. Thing is, is not happening to all > > VMs in that test (around 300), but it's happening to a fair amount - around > > 25%. > > > I can see the callback coming few seconds after neutron openvswitch agent > > says it's completed the setup. My question is, why is it taking so long for > > nova openvswitch agent to configure the port? I can see the port up in both > > host OS and openvswitch. I would assume it's doing the whole namespace and > > iptables setup. But still, 30 minutes? Seems a lot! > > > Thanks, > > Radu > > > On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: > > > We have other scheduled tests that perform end-to-end (assign floating IP, > > ssh, ping outside) and never had an issue. > > I think we turned it off because the callback code was initially buggy and > > nova would wait forever while things were in fact ok, but I'll change > > "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run > > another large test, just to confirm. > > > We usually run these large tests after a version upgrade to test the APIs > > under load. > > > > > On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > > wrote: > > > On 5/17/2018 9:46 AM, George Mihaiescu wrote: > > > and large rally tests of 500 instances complete with no issues. > > > > Sure, except you can't ssh into the guests. > > > The whole reason the vif plugging is fatal and timeout and callback code was > > because the upstream CI was unstable without it. The server would report as > > ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE > > guest that you can't actually do anything with is kind of pointless. > > > _______________________________________________ > > > OpenStack-operators mailing list > > > OpenStack-operators at lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From doug at doughellmann.com Thu May 24 14:07:10 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Thu, 24 May 2018 07:07:10 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <1527141275-sup-1922@lrrr.local> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <1527141275-sup-1922@lrrr.local> Message-ID: <1527170041-sup-4254@lrrr.local> Excerpts from Doug Hellmann's message of 2018-05-23 22:58:40 -0700: > Excerpts from Melvin Hillsman's message of 2018-05-23 22:26:02 -0700: > > Great to see this moving. I have some questions/concerns based on your > > statement Doug about docs.openstack.org publishing and do not want to > > detour the conversation but ask for feedback. Currently there are a number > > I'm just unclear on that, but don't consider it a blocker. We will sort > out whatever governance or policy change is needed to let this move > forward. When I talked with Petr about it, he pointed to the Security SIG and Security Guide as a parallel precedent for this. IIRC, yesterday Adam mentioned that the Self-Healing SIG was also going to be managing some documentation, so we have two examples. Looking at https://governance.openstack.org/sigs/, I don't see another existing SIG that it would make sense to join, so, I think to deal with the publishing rights we would want set up a SIG for something like "Operator Documentation," which gives you some flexibility on exactly what content is managed. I know you wanted to avoid lots of governance overhead, so I want to just mention that establishing a SIG is meant to be a painless and light-weight way to declare that a group of interested people exists so that others can find them and participate in the work [1]. It shouldn't take much effort to do the setup, and any ongoing communication is something you would presumably by doing anyway among a group of people trying to collaborate on a project like this. Let me know if you have any questions or concerns about the process. Doug [1] https://governance.openstack.org/sigs/#process-to-create-a-sig > > > of repositories under osops- > > > > https://github.com/openstack-infra/project-config/blob/master/gerrit/projects.yaml#L5673-L5703 > > > > Generally active: > > osops-tools-contrib > > osops-tools-generic > > osops-tools-monitoring > > > > > > Probably dead: > > osops-tools-logging > > osops-coda > > osops-example-configs > > > > Because you are more familiar with how things work, is there a way to > > consolidate these vs coming up with another repo like osops-docs or > > whatever in this case? And second, is there already governance clearance to > > publish based on the following - https://launchpad.net/osops - which is > > where these repos originated. > > I don't really know what any of those things are, or whether it > makes sense to put this new content there. I assumed we would make > a repo with a name like "operations-guide", but that's up to Chris > and John. If they think reusing an existing repository makes sense, > that would be OK with me, but it's cheap and easy to set up a new > one, too. > > My main concern is that we remove the road blocks, now that we have > people interested in contributing to this documentation. > > > > > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: > > > > > Hi Chris, > > > > > > thanks for summarize our session today in Vancouver. As I18n PTL and one > > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but > > > unfortunatelly not on-site. > > > I couldn't also not get the full history of the story and that's also not > > > the idea to starting finger pointing. As usualy we moving forward and there > > > are some interesting things to know what happened. > > > First of all: There are no "Docs-Team" anymore. If you look at [1] there > > > are mostly part-time contributors like me or people are more involved in > > > other projects and therefore busy. Because of that, the responsibility of > > > documentation content are moved completely to the project teams. Each repo > > > has a user guide, admin guide, deployment guide, and so on. The small > > > Documentation Team provides only tooling and give advices how to write and > > > publish a document. So it's up to you to re-use the old repo on [2] or > > > setup a new one. I would recommend to use the best of both worlds. There > > > are a very good toolset in place for testing and publishing documents. > > > There are also various text editors for rst extensions available, like in > > > vim, notepad++ or also online services. I understand the concerns and when > > > people are sad because their patches are ignored for months. But it's > > > alltime a question of responsibilty and how can spend people time. > > > I would be available for help. As I18n PTL I could imagine that a > > > OpenStack Operations Guide is available in different languages and portable > > > in different formats like in Sphinx. For us as translation team it's a good > > > possibility to get feedback about the quality and to understand the > > > requirements, also for other documents. > > > So let's move on. > > > > > > kind regards > > > > > > Frank > > > > > > [1] https://review.openstack.org/#/admin/groups/30,members > > > [2] https://github.com/openstack/operations-guide > > > > > > > > > Am 2018-05-24 03:38, schrieb Chris Morgan: > > > > > >> Hello Everyone, > > >> > > >> In the Ops Community documentation working session today in Vancouver, > > >> we made some really good progress (etherpad here: > > >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of > > >> the good stuff is yet written down). > > >> > > >> In short, we're going to course correct on maintaining the Operators > > >> Guide, the HA Guide and Architecture Guide, not edit-in-place via the > > >> wiki and instead try still maintaining them as code, but with a > > >> different, new set of owners, possibly in a new Ops-focused repo. > > >> There was a strong consensus that a) code workflow >> wiki workflow > > >> and that b) openstack core docs tools are just fine. > > >> > > >> There is a lot still to be decided on how where and when, but we do > > >> have an offer of a rewrite of the HA Guide, as long as the changes > > >> will be allowed to actually land, so we expect to actually start > > >> showing some progress. > > >> > > >> At the end of the session, people wanted to know how to follow along > > >> as various people work out how to do this... and so for now that place > > >> is this very email thread. The idea is if the code for those documents > > >> goes to live in a different repo, or if new contributors turn up, or > > >> if a new version we will announce/discuss it here until such time as > > >> we have a better home for this initiative. > > >> > > >> Cheers > > >> > > >> Chris > > >> > > >> -- > > >> Chris Morgan > > >> _______________________________________________ > > >> OpenStack-operators mailing list > > >> OpenStack-operators at lists.openstack.org > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > >> > > > > > > > > > _______________________________________________ > > > OpenStack-operators mailing list > > > OpenStack-operators at lists.openstack.org > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > > > From jon at csail.mit.edu Thu May 24 14:19:29 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Thu, 24 May 2018 07:19:29 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> Message-ID: <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> My intention based on current understandign would be to create a git repo called "osops-docs" as this fits current naming an thin initial document we intend to put there and the others we may adopt from docs-team. My understanding being they don't to have this type of documentention due to much reduced team size and prefer it live with subject matter experts. It that correct? If that's not correct I'm not personally opposed to trying this under docs. We'll need to maintain enough contributors and reviewers to make the work flow go in either location and that's my understanding of the basic issue not where it lives. This naming would also match other repos wich could be consolidated into an "osops" repo to rule them all. That may make sense as I think there's significant overlap in set of people who might contribute, but that can be a parallel conversation. Doug looking at new project docs I think most of it is clear enough to me. Since it's not code I can skip all th PyPi stuff yes? The repo creation seems pretty clear and I can steal the CI stuff from similar projects. I'm a little unclear on the Storyboard bit I've not done much contribution lately and haven't storyboarded. Is that relevant (or at least relevent at first) for this use case? If it is I probably have more questions. I agree governance can also be a parallel discussion. I don't have strong opinions there but seems based on participants and content like a "UC" thing but < shrug /> -Jon From jon at csail.mit.edu Thu May 24 14:26:55 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Thu, 24 May 2018 07:26:55 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <1527170041-sup-4254@lrrr.local> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <1527141275-sup-1922@lrrr.local> <1527170041-sup-4254@lrrr.local> Message-ID: <20180524142655.btcnop2tpexq32of@csail.mit.edu> On Thu, May 24, 2018 at 07:07:10AM -0700, Doug Hellmann wrote: :I know you wanted to avoid lots of governance overhead, so I want :to just mention that establishing a SIG is meant to be a painless :and light-weight way to declare that a group of interested people :exists so that others can find them and participate in the work :[1]. It shouldn't take much effort to do the setup, and any ongoing :communication is something you would presumably by doing anyway :among a group of people trying to collaborate on a project like :this. Yeah I can see SIG as a useful structure too. I'm just more familiar with UC "teams" because of my personal history. I do thing SIG -vs- team would impace repo naming, and I'm still going over creation doc, so I'll let this simmer here at least until YVR lunch time to see if there's consensus or cotroversy in the potential contributer community. Lacking either I think I will default to SIG-ops-docs. Thanks, -Jon : :Let me know if you have any questions or concerns about the process. : :Doug : :[1] https://governance.openstack.org/sigs/#process-to-create-a-sig : :> :> > of repositories under osops- :> > :> > https://github.com/openstack-infra/project-config/blob/master/gerrit/projects.yaml#L5673-L5703 :> > :> > Generally active: :> > osops-tools-contrib :> > osops-tools-generic :> > osops-tools-monitoring :> > :> > :> > Probably dead: :> > osops-tools-logging :> > osops-coda :> > osops-example-configs :> > :> > Because you are more familiar with how things work, is there a way to :> > consolidate these vs coming up with another repo like osops-docs or :> > whatever in this case? And second, is there already governance clearance to :> > publish based on the following - https://launchpad.net/osops - which is :> > where these repos originated. :> :> I don't really know what any of those things are, or whether it :> makes sense to put this new content there. I assumed we would make :> a repo with a name like "operations-guide", but that's up to Chris :> and John. If they think reusing an existing repository makes sense, :> that would be OK with me, but it's cheap and easy to set up a new :> one, too. :> :> My main concern is that we remove the road blocks, now that we have :> people interested in contributing to this documentation. :> :> > :> > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker wrote: :> > :> > > Hi Chris, :> > > :> > > thanks for summarize our session today in Vancouver. As I18n PTL and one :> > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but :> > > unfortunatelly not on-site. :> > > I couldn't also not get the full history of the story and that's also not :> > > the idea to starting finger pointing. As usualy we moving forward and there :> > > are some interesting things to know what happened. :> > > First of all: There are no "Docs-Team" anymore. If you look at [1] there :> > > are mostly part-time contributors like me or people are more involved in :> > > other projects and therefore busy. Because of that, the responsibility of :> > > documentation content are moved completely to the project teams. Each repo :> > > has a user guide, admin guide, deployment guide, and so on. The small :> > > Documentation Team provides only tooling and give advices how to write and :> > > publish a document. So it's up to you to re-use the old repo on [2] or :> > > setup a new one. I would recommend to use the best of both worlds. There :> > > are a very good toolset in place for testing and publishing documents. :> > > There are also various text editors for rst extensions available, like in :> > > vim, notepad++ or also online services. I understand the concerns and when :> > > people are sad because their patches are ignored for months. But it's :> > > alltime a question of responsibilty and how can spend people time. :> > > I would be available for help. As I18n PTL I could imagine that a :> > > OpenStack Operations Guide is available in different languages and portable :> > > in different formats like in Sphinx. For us as translation team it's a good :> > > possibility to get feedback about the quality and to understand the :> > > requirements, also for other documents. :> > > So let's move on. :> > > :> > > kind regards :> > > :> > > Frank :> > > :> > > [1] https://review.openstack.org/#/admin/groups/30,members :> > > [2] https://github.com/openstack/operations-guide :> > > :> > > :> > > Am 2018-05-24 03:38, schrieb Chris Morgan: :> > > :> > >> Hello Everyone, :> > >> :> > >> In the Ops Community documentation working session today in Vancouver, :> > >> we made some really good progress (etherpad here: :> > >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not all of :> > >> the good stuff is yet written down). :> > >> :> > >> In short, we're going to course correct on maintaining the Operators :> > >> Guide, the HA Guide and Architecture Guide, not edit-in-place via the :> > >> wiki and instead try still maintaining them as code, but with a :> > >> different, new set of owners, possibly in a new Ops-focused repo. :> > >> There was a strong consensus that a) code workflow >> wiki workflow :> > >> and that b) openstack core docs tools are just fine. :> > >> :> > >> There is a lot still to be decided on how where and when, but we do :> > >> have an offer of a rewrite of the HA Guide, as long as the changes :> > >> will be allowed to actually land, so we expect to actually start :> > >> showing some progress. :> > >> :> > >> At the end of the session, people wanted to know how to follow along :> > >> as various people work out how to do this... and so for now that place :> > >> is this very email thread. The idea is if the code for those documents :> > >> goes to live in a different repo, or if new contributors turn up, or :> > >> if a new version we will announce/discuss it here until such time as :> > >> we have a better home for this initiative. :> > >> :> > >> Cheers :> > >> :> > >> Chris :> > >> :> > >> -- :> > >> Chris Morgan :> > >> _______________________________________________ :> > >> OpenStack-operators mailing list :> > >> OpenStack-operators at lists.openstack.org :> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators :> > >> :> > > :> > > :> > > _______________________________________________ :> > > OpenStack-operators mailing list :> > > OpenStack-operators at lists.openstack.org :> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators :> > > :> > : :_______________________________________________ :OpenStack-operators mailing list :OpenStack-operators at lists.openstack.org :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From mrhillsman at gmail.com Thu May 24 20:26:07 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Thu, 24 May 2018 13:26:07 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180524142655.btcnop2tpexq32of@csail.mit.edu> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <1527141275-sup-1922@lrrr.local> <1527170041-sup-4254@lrrr.local> <20180524142655.btcnop2tpexq32of@csail.mit.edu> Message-ID: I think a great model we have in general as a community is if people show up to do the work, it is not something crazy, get out of their way; at least that is how I think of it. I apologize if there is any perception opposed to my previous statement by me bringing up the other repos. I tried to be clear in wanting to get feedback from Doug in hope that as we move forward in general, what are some thoughts on that front to ensure we continue to remove roadblocks if any exist in parallel to great work, like what Chris is driving here. On that front, please do what works best for those doing the work. On Thu, May 24, 2018 at 7:26 AM, Jonathan D. Proulx wrote: > On Thu, May 24, 2018 at 07:07:10AM -0700, Doug Hellmann wrote: > > :I know you wanted to avoid lots of governance overhead, so I want > :to just mention that establishing a SIG is meant to be a painless > :and light-weight way to declare that a group of interested people > :exists so that others can find them and participate in the work > :[1]. It shouldn't take much effort to do the setup, and any ongoing > :communication is something you would presumably by doing anyway > :among a group of people trying to collaborate on a project like > :this. > > Yeah I can see SIG as a useful structure too. I'm just more familiar > with UC "teams" because of my personal history. > > I do thing SIG -vs- team would impace repo naming, and I'm still going > over creation doc, so I'll let this simmer here at least until YVR lunch > time to see if there's consensus or cotroversy in the potential > contributer community. Lacking either I think I will default to > SIG-ops-docs. > > Thanks, > -Jon > > : > :Let me know if you have any questions or concerns about the process. > : > :Doug > : > :[1] https://governance.openstack.org/sigs/#process-to-create-a-sig > : > :> > :> > of repositories under osops- > :> > > :> > https://github.com/openstack-infra/project-config/blob/ > master/gerrit/projects.yaml#L5673-L5703 > :> > > :> > Generally active: > :> > osops-tools-contrib > :> > osops-tools-generic > :> > osops-tools-monitoring > :> > > :> > > :> > Probably dead: > :> > osops-tools-logging > :> > osops-coda > :> > osops-example-configs > :> > > :> > Because you are more familiar with how things work, is there a way to > :> > consolidate these vs coming up with another repo like osops-docs or > :> > whatever in this case? And second, is there already governance > clearance to > :> > publish based on the following - https://launchpad.net/osops - which > is > :> > where these repos originated. > :> > :> I don't really know what any of those things are, or whether it > :> makes sense to put this new content there. I assumed we would make > :> a repo with a name like "operations-guide", but that's up to Chris > :> and John. If they think reusing an existing repository makes sense, > :> that would be OK with me, but it's cheap and easy to set up a new > :> one, too. > :> > :> My main concern is that we remove the road blocks, now that we have > :> people interested in contributing to this documentation. > :> > :> > > :> > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker > wrote: > :> > > :> > > Hi Chris, > :> > > > :> > > thanks for summarize our session today in Vancouver. As I18n PTL > and one > :> > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but > :> > > unfortunatelly not on-site. > :> > > I couldn't also not get the full history of the story and that's > also not > :> > > the idea to starting finger pointing. As usualy we moving forward > and there > :> > > are some interesting things to know what happened. > :> > > First of all: There are no "Docs-Team" anymore. If you look at [1] > there > :> > > are mostly part-time contributors like me or people are more > involved in > :> > > other projects and therefore busy. Because of that, the > responsibility of > :> > > documentation content are moved completely to the project teams. > Each repo > :> > > has a user guide, admin guide, deployment guide, and so on. The > small > :> > > Documentation Team provides only tooling and give advices how to > write and > :> > > publish a document. So it's up to you to re-use the old repo on [2] > or > :> > > setup a new one. I would recommend to use the best of both worlds. > There > :> > > are a very good toolset in place for testing and publishing > documents. > :> > > There are also various text editors for rst extensions available, > like in > :> > > vim, notepad++ or also online services. I understand the concerns > and when > :> > > people are sad because their patches are ignored for months. But > it's > :> > > alltime a question of responsibilty and how can spend people time. > :> > > I would be available for help. As I18n PTL I could imagine that a > :> > > OpenStack Operations Guide is available in different languages and > portable > :> > > in different formats like in Sphinx. For us as translation team > it's a good > :> > > possibility to get feedback about the quality and to understand the > :> > > requirements, also for other documents. > :> > > So let's move on. > :> > > > :> > > kind regards > :> > > > :> > > Frank > :> > > > :> > > [1] https://review.openstack.org/#/admin/groups/30,members > :> > > [2] https://github.com/openstack/operations-guide > :> > > > :> > > > :> > > Am 2018-05-24 03:38, schrieb Chris Morgan: > :> > > > :> > >> Hello Everyone, > :> > >> > :> > >> In the Ops Community documentation working session today in > Vancouver, > :> > >> we made some really good progress (etherpad here: > :> > >> https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but not > all of > :> > >> the good stuff is yet written down). > :> > >> > :> > >> In short, we're going to course correct on maintaining the > Operators > :> > >> Guide, the HA Guide and Architecture Guide, not edit-in-place via > the > :> > >> wiki and instead try still maintaining them as code, but with a > :> > >> different, new set of owners, possibly in a new Ops-focused repo. > :> > >> There was a strong consensus that a) code workflow >> wiki workflow > :> > >> and that b) openstack core docs tools are just fine. > :> > >> > :> > >> There is a lot still to be decided on how where and when, but we do > :> > >> have an offer of a rewrite of the HA Guide, as long as the changes > :> > >> will be allowed to actually land, so we expect to actually start > :> > >> showing some progress. > :> > >> > :> > >> At the end of the session, people wanted to know how to follow > along > :> > >> as various people work out how to do this... and so for now that > place > :> > >> is this very email thread. The idea is if the code for those > documents > :> > >> goes to live in a different repo, or if new contributors turn up, > or > :> > >> if a new version we will announce/discuss it here until such time > as > :> > >> we have a better home for this initiative. > :> > >> > :> > >> Cheers > :> > >> > :> > >> Chris > :> > >> > :> > >> -- > :> > >> Chris Morgan > :> > >> _______________________________________________ > :> > >> OpenStack-operators mailing list > :> > >> OpenStack-operators at lists.openstack.org > :> > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > :> > >> > :> > > > :> > > > :> > > _______________________________________________ > :> > > OpenStack-operators mailing list > :> > > OpenStack-operators at lists.openstack.org > :> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/ > openstack-operators > :> > > > :> > > : > :_______________________________________________ > :OpenStack-operators mailing list > :OpenStack-operators at lists.openstack.org > :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon at csail.mit.edu Thu May 24 20:31:29 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Thu, 24 May 2018 13:31:29 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <1527141275-sup-1922@lrrr.local> <1527170041-sup-4254@lrrr.local> <20180524142655.btcnop2tpexq32of@csail.mit.edu> Message-ID: <20180524203129.ovlsvq4qe6tcrh7b@csail.mit.edu> On Thu, May 24, 2018 at 01:26:07PM -0700, Melvin Hillsman wrote: : I think a great model we have in general as a community is if people : show up to do the work, it is not something crazy, get out of their : way; at least that is how I think of it. I apologize if there is any : perception opposed to my previous statement by me bringing up the other : repos. I tried to be clear in wanting to get feedback from Doug in hope : that as we move forward in general, what are some thoughts on that : front to ensure we continue to remove roadblocks if any exist in : parallel to great work, like what Chris is driving here. On that front, : please do what works best for those doing the work. No worries I feel the love :) Going to go forward implemnting as SIG + repo which seems lightest way forward, we can always adapt and evolve. -Jon : On Thu, May 24, 2018 at 7:26 AM, Jonathan D. Proulx : <[1]jon at csail.mit.edu> wrote: : : On Thu, May 24, 2018 at 07:07:10AM -0700, Doug Hellmann wrote: : :I know you wanted to avoid lots of governance overhead, so I want : :to just mention that establishing a SIG is meant to be a painless : :and light-weight way to declare that a group of interested people : :exists so that others can find them and participate in the work : :[1]. It shouldn't take much effort to do the setup, and any ongoing : :communication is something you would presumably by doing anyway : :among a group of people trying to collaborate on a project like : :this. : Yeah I can see SIG as a useful structure too. I'm just more : familiar : with UC "teams" because of my personal history. : I do thing SIG -vs- team would impace repo naming, and I'm still : going : over creation doc, so I'll let this simmer here at least until YVR : lunch : time to see if there's consensus or cotroversy in the potential : contributer community. Lacking either I think I will default to : SIG-ops-docs. : Thanks, : -Jon : : : :Let me know if you have any questions or concerns about the : process. : : : : :Doug : : : :[1] [2]https://governance.openstack.org/sigs/#process-to-create-a-sig : : : :> : :> > of repositories under osops- : :> > : :> > [3]https://github.com/openstack-infra/project-config/blob/ : master/gerrit/projects.yaml#L5673-L5703 : :> > : :> > Generally active: : :> > osops-tools-contrib : :> > osops-tools-generic : :> > osops-tools-monitoring : :> > : :> > : :> > Probably dead: : :> > osops-tools-logging : :> > osops-coda : :> > osops-example-configs : :> > : :> > Because you are more familiar with how things work, is there a way : to : :> > consolidate these vs coming up with another repo like osops-docs : or : :> > whatever in this case? And second, is there already governance : clearance to : :> > publish based on the following - [4]https://launchpad.net/osops - : which is : :> > where these repos originated. : :> : :> I don't really know what any of those things are, or whether it : :> makes sense to put this new content there. I assumed we would make : :> a repo with a name like "operations-guide", but that's up to Chris : :> and John. If they think reusing an existing repository makes sense, : :> that would be OK with me, but it's cheap and easy to set up a new : :> one, too. : :> : :> My main concern is that we remove the road blocks, now that we have : :> people interested in contributing to this documentation. : :> : :> > : :> > On Wed, May 23, 2018 at 9:56 PM, Frank Kloeker <[5]eumel at arcor.de> : wrote: : :> > : :> > > Hi Chris, : :> > > : :> > > thanks for summarize our session today in Vancouver. As I18n PTL : and one : :> > > of the Docs Core I put Petr in Cc. He is currently Docs PTL, but : :> > > unfortunatelly not on-site. : :> > > I couldn't also not get the full history of the story and that's : also not : :> > > the idea to starting finger pointing. As usualy we moving : forward and there : :> > > are some interesting things to know what happened. : :> > > First of all: There are no "Docs-Team" anymore. If you look at : [1] there : :> > > are mostly part-time contributors like me or people are more : involved in : :> > > other projects and therefore busy. Because of that, the : responsibility of : :> > > documentation content are moved completely to the project teams. : Each repo : :> > > has a user guide, admin guide, deployment guide, and so on. The : small : :> > > Documentation Team provides only tooling and give advices how to : write and : :> > > publish a document. So it's up to you to re-use the old repo on : [2] or : :> > > setup a new one. I would recommend to use the best of both : worlds. There : :> > > are a very good toolset in place for testing and publishing : documents. : :> > > There are also various text editors for rst extensions : available, like in : :> > > vim, notepad++ or also online services. I understand the : concerns and when : :> > > people are sad because their patches are ignored for months. But : it's : :> > > alltime a question of responsibilty and how can spend people : time. : :> > > I would be available for help. As I18n PTL I could imagine that : a : :> > > OpenStack Operations Guide is available in different languages : and portable : :> > > in different formats like in Sphinx. For us as translation team : it's a good : :> > > possibility to get feedback about the quality and to understand : the : :> > > requirements, also for other documents. : :> > > So let's move on. : :> > > : :> > > kind regards : :> > > : :> > > Frank : :> > > : :> > > [1] [6]https://review.openstack.org/#/admin/groups/30,members : :> > > [2] [7]https://github.com/openstack/operations-guide : :> > > : :> > > : :> > > Am 2018-05-24 03:38, schrieb Chris Morgan: : :> > > : :> > >> Hello Everyone, : :> > >> : :> > >> In the Ops Community documentation working session today in : Vancouver, : :> > >> we made some really good progress (etherpad here: : :> > >> [8]https://etherpad.openstack.org/p/YVR-Ops-Community-Docs but : not all of : :> > >> the good stuff is yet written down). : :> > >> : :> > >> In short, we're going to course correct on maintaining the : Operators : :> > >> Guide, the HA Guide and Architecture Guide, not edit-in-place : via the : :> > >> wiki and instead try still maintaining them as code, but with a : :> > >> different, new set of owners, possibly in a new Ops-focused : repo. : :> > >> There was a strong consensus that a) code workflow >> wiki : workflow : :> > >> and that b) openstack core docs tools are just fine. : :> > >> : :> > >> There is a lot still to be decided on how where and when, but : we do : :> > >> have an offer of a rewrite of the HA Guide, as long as the : changes : :> > >> will be allowed to actually land, so we expect to actually : start : :> > >> showing some progress. : :> > >> : :> > >> At the end of the session, people wanted to know how to follow : along : :> > >> as various people work out how to do this... and so for now : that place : :> > >> is this very email thread. The idea is if the code for those : documents : :> > >> goes to live in a different repo, or if new contributors turn : up, or : :> > >> if a new version we will announce/discuss it here until such : time as : :> > >> we have a better home for this initiative. : :> > >> : :> > >> Cheers : :> > >> : :> > >> Chris : :> > >> : :> > >> -- : :> > >> Chris Morgan <[9]mihalis68 at gmail.com> : :> > >> _______________________________________________ : :> > >> OpenStack-operators mailing list : :> > >> [10]OpenStack-operators at lists.openstack.org : :> > >> [11]http://lists.openstack.org/cgi-bin/mailman/listinfo/ : openstack-operators : :> > >> : :> > > : :> > > : :> > > _______________________________________________ : :> > > OpenStack-operators mailing list : :> > > [12]OpenStack-operators at lists.openstack.org : :> > > [13]http://lists.openstack.org/cgi-bin/mailman/listinfo/ : openstack-operators : :> > > : :> > : : : :_______________________________________________ : :OpenStack-operators mailing list : :[14]OpenStack-operators at lists.openstack.org : :[15]http://lists.openstack.org/cgi-bin/mailman/listinfo/ : openstack-operators : _______________________________________________ : OpenStack-operators mailing list : [16]OpenStack-operators at lists.openstack.org : [17]http://lists.openstack.org/cgi-bin/mailman/listinfo/ : openstack-operators : : -- : Kind regards, : Melvin Hillsman : [18]mrhillsman at gmail.com : mobile: (832) 264-2646 : :References : : 1. mailto:jon at csail.mit.edu : 2. https://governance.openstack.org/sigs/#process-to-create-a-sig : 3. https://github.com/openstack-infra/project-config/blob/master/gerrit/projects.yaml#L5673-L5703 : 4. https://launchpad.net/osops : 5. mailto:eumel at arcor.de : 6. https://review.openstack.org/#/admin/groups/30,members : 7. https://github.com/openstack/operations-guide : 8. https://etherpad.openstack.org/p/YVR-Ops-Community-Docs : 9. mailto:mihalis68 at gmail.com : 10. mailto:OpenStack-operators at lists.openstack.org : 11. http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : 12. mailto:OpenStack-operators at lists.openstack.org : 13. http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : 14. mailto:OpenStack-operators at lists.openstack.org : 15. http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : 16. mailto:OpenStack-operators at lists.openstack.org : 17. http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators : 18. mailto:mrhillsman at gmail.com From blair.bethwaite at gmail.com Thu May 24 21:59:16 2018 From: blair.bethwaite at gmail.com (Blair Bethwaite) Date: Fri, 25 May 2018 07:59:16 +1000 Subject: [Openstack-operators] pci passthrough & numa affinity Message-ID: Hi Jon, Following up to the question you asked during the HPC on OpenStack panel at the summit yesterday... You might have already seen Daniel Berrange's blog on this topic: https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ ? He essentially describes how you can get around the issue of the naive flat pci bus topology in the guest - exposing numa affinity of the PCIe root ports requires newish qemu and libvirt. However, best I can tell there is no way to do this with Nova today. Are you interested in working together on a spec for this? The other related feature of interest here (newer though - no libvirt support yet I think) is gpu cliques (https://github.com/qemu/qemu/commit/dfbee78db8fdf7bc8c151c3d29504bb47438480b), would be really nice to have a way to set these up through Nova once libvirt supports it. -- Cheers, ~Blairo From jon at csail.mit.edu Thu May 24 22:19:09 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Thu, 24 May 2018 15:19:09 -0700 Subject: [Openstack-operators] pci passthrough & numa affinity In-Reply-To: References: Message-ID: <20180524221909.tgdivnx6dvotdwnl@csail.mit.edu> On Fri, May 25, 2018 at 07:59:16AM +1000, Blair Bethwaite wrote: :Hi Jon, : :Following up to the question you asked during the HPC on OpenStack :panel at the summit yesterday... : :You might have already seen Daniel Berrange's blog on this topic: :https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ :? He essentially describes how you can get around the issue of the :naive flat pci bus topology in the guest - exposing numa affinity of :the PCIe root ports requires newish qemu and libvirt. Thanks for the pointer not sure if I've seen that one, I've seen a few ways to map manually. I would have been quite surprised if nova did this so I am poking at libvirt.xml outside nova for now :However, best I can tell there is no way to do this with Nova today. :Are you interested in working together on a spec for this? I'm not yet convinced it's worth the bother, that's the crux of the question I'm investigating. Is this worth the effort? There's a meta question "do I have time to find out" :) :The other related feature of interest here (newer though - no libvirt :support yet I think) is gpu cliques :(https://github.com/qemu/qemu/commit/dfbee78db8fdf7bc8c151c3d29504bb47438480b), :would be really nice to have a way to set these up through Nova once :libvirt supports it. Thanks, -Jon From mriedemos at gmail.com Thu May 24 22:19:49 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 24 May 2018 15:19:49 -0700 Subject: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI Message-ID: I've written a nova-manage placement heal_allocations CLI [1] which was a TODO from the PTG in Dublin as a step toward getting existing CachingScheduler users to roll off that (which is deprecated). During the CERN cells v1 upgrade talk it was pointed out that CERN was able to go from placement-per-cell to centralized placement in Ocata because the nova-computes in each cell would automatically recreate the allocations in Placement in a periodic task, but that code is gone once you're upgraded to Pike or later. In various other talks during the summit this week, we've talked about things during upgrades where, for instance, if placement is down for some reason during an upgrade, a user deletes an instance and the allocation doesn't get cleaned up from placement so it's going to continue counting against resource usage on that compute node even though the server instance in nova is gone. So this CLI could be expanded to help clean up situations like that, e.g. provide it a specific server ID and the CLI can figure out if it needs to clean things up in placement. So there are plenty of things we can build into this, but the patch is already quite large. I expect we'll also be backporting this to stable branches to help operators upgrade/fix allocation issues. It already has several things listed in a code comment inline about things to build into this later. My question is, is this good enough for a first iteration or is there something severely missing before we can merge this, like the automatic marker tracking mentioned in the code (that will probably be a non-trivial amount of code to add). I could really use some operator feedback on this to just take a look at what it already is capable of and if it's not going to be useful in this iteration, let me know what's missing and I can add that in to the patch. [1] https://review.openstack.org/#/c/565886/ -- Thanks, Matt From openstack at fried.cc Thu May 24 23:34:06 2018 From: openstack at fried.cc (Eric Fried) Date: Thu, 24 May 2018 18:34:06 -0500 Subject: [Openstack-operators] pci passthrough & numa affinity In-Reply-To: <20180524221909.tgdivnx6dvotdwnl@csail.mit.edu> References: <20180524221909.tgdivnx6dvotdwnl@csail.mit.edu> Message-ID: How long are you willing to wait? The work we're doing to use Placement from Nova ought to allow us to model both of these things nicely from the virt driver, and request them nicely from the flavor. By the end of Rocky we will have laid a large percentage of the groundwork to enable this. This is all part of the road to what we've been calling "generic device management" (GDM) -- which we hope will eventually let us remove most/all of the existing PCI passthrough code. I/we would be interested in hearing more specifics of your requirements around this, as it will help inform the GDM roadmap. And of course, upstream help & contributions would be very welcome. Thanks, efried On 05/24/2018 05:19 PM, Jonathan D. Proulx wrote: > On Fri, May 25, 2018 at 07:59:16AM +1000, Blair Bethwaite wrote: > :Hi Jon, > : > :Following up to the question you asked during the HPC on OpenStack > :panel at the summit yesterday... > : > :You might have already seen Daniel Berrange's blog on this topic: > :https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ > :? He essentially describes how you can get around the issue of the > :naive flat pci bus topology in the guest - exposing numa affinity of > :the PCIe root ports requires newish qemu and libvirt. > > Thanks for the pointer not sure if I've seen that one, I've seen a few > ways to map manually. I would have been quite surprised if nova did > this so I am poking at libvirt.xml outside nova for now > > :However, best I can tell there is no way to do this with Nova today. > :Are you interested in working together on a spec for this? > > I'm not yet convinced it's worth the bother, that's the crux of the > question I'm investigating. Is this worth the effort? There's a meta > question "do I have time to find out" :) > > :The other related feature of interest here (newer though - no libvirt > :support yet I think) is gpu cliques > :(https://github.com/qemu/qemu/commit/dfbee78db8fdf7bc8c151c3d29504bb47438480b), > :would be really nice to have a way to set these up through Nova once > :libvirt supports it. > > Thanks, > -Jon > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From jon at csail.mit.edu Fri May 25 01:58:35 2018 From: jon at csail.mit.edu (Jonathan D. Proulx) Date: Thu, 24 May 2018 18:58:35 -0700 Subject: [Openstack-operators] pci passthrough & numa affinity In-Reply-To: References: <20180524221909.tgdivnx6dvotdwnl@csail.mit.edu> Message-ID: <20180525015835.kesc3lcdjrwlpsrg@csail.mit.edu> On Thu, May 24, 2018 at 06:34:06PM -0500, Eric Fried wrote: :How long are you willing to wait? : :The work we're doing to use Placement from Nova ought to allow us to :model both of these things nicely from the virt driver, and request them :nicely from the flavor. : :By the end of Rocky we will have laid a large percentage of the :groundwork to enable this. This is all part of the road to what we've :been calling "generic device management" (GDM) -- which we hope will :eventually let us remove most/all of the existing PCI passthrough code. : :I/we would be interested in hearing more specifics of your requirements :around this, as it will help inform the GDM roadmap. And of course, :upstream help & contributions would be very welcome. Sounds like good work. My use case is not yet very clear. I do have some upcoming discussions with users around requirements and funding so being able to say "this is on the road map and could be accelerated with developer hours" is useful. I expect patience is what will come of that but very good to know where to go when I get some clarity and if I get some resources. -Jon : :Thanks, :efried : :On 05/24/2018 05:19 PM, Jonathan D. Proulx wrote: :> On Fri, May 25, 2018 at 07:59:16AM +1000, Blair Bethwaite wrote: :> :Hi Jon, :> : :> :Following up to the question you asked during the HPC on OpenStack :> :panel at the summit yesterday... :> : :> :You might have already seen Daniel Berrange's blog on this topic: :> :https://www.berrange.com/posts/2017/02/16/setting-up-a-nested-kvm-guest-for-developing-testing-pci-device-assignment-with-numa/ :> :? He essentially describes how you can get around the issue of the :> :naive flat pci bus topology in the guest - exposing numa affinity of :> :the PCIe root ports requires newish qemu and libvirt. :> :> Thanks for the pointer not sure if I've seen that one, I've seen a few :> ways to map manually. I would have been quite surprised if nova did :> this so I am poking at libvirt.xml outside nova for now :> :> :However, best I can tell there is no way to do this with Nova today. :> :Are you interested in working together on a spec for this? :> :> I'm not yet convinced it's worth the bother, that's the crux of the :> question I'm investigating. Is this worth the effort? There's a meta :> question "do I have time to find out" :) :> :> :The other related feature of interest here (newer though - no libvirt :> :support yet I think) is gpu cliques :> :(https://github.com/qemu/qemu/commit/dfbee78db8fdf7bc8c151c3d29504bb47438480b), :> :would be really nice to have a way to set these up through Nova once :> :libvirt supports it. :> :> Thanks, :> -Jon :> :> :> _______________________________________________ :> OpenStack-operators mailing list :> OpenStack-operators at lists.openstack.org :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators :> : :_______________________________________________ :OpenStack-operators mailing list :OpenStack-operators at lists.openstack.org :http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From doug at doughellmann.com Fri May 25 12:30:40 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Fri, 25 May 2018 05:30:40 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> Message-ID: <1527251170-sup-2275@lrrr.local> Excerpts from Jonathan D. Proulx's message of 2018-05-24 07:19:29 -0700: > > My intention based on current understandign would be to create a git > repo called "osops-docs" as this fits current naming an thin initial > document we intend to put there and the others we may adopt from > docs-team. Normally I would say "yay, consistency!" In this case, let's verify that that name isn't going to have an undesirable effect when the content is published. I know the default destination directory for the publish job is taken from the repository name, which would mean we would have a URL like docs.openstack.org/osops-docs. I don't know if there is a way to override that, but the infra team will know. So, if you want a URL like docs.o.o/operations-guide instead, you'll want to check with the infra folks before creating the repo to make sure it's set up in a way to get the URL you want. Doug From jon at csail.mit.edu Fri May 25 17:37:39 2018 From: jon at csail.mit.edu (Jonathan Proulx) Date: Fri, 25 May 2018 10:37:39 -0700 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <1527251170-sup-2275@lrrr.local> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> <1527251170-sup-2275@lrrr.local> Message-ID: <37FF3737-79CA-4ED2-B078-900F26F99C53@csail.mit.edu> On May 25, 2018 5:30:40 AM PDT, Doug Hellmann wrote: >Excerpts from Jonathan D. Proulx's message of 2018-05-24 07:19:29 >-0700: >> >> My intention based on current understandign would be to create a git >> repo called "osops-docs" as this fits current naming an thin initial >> document we intend to put there and the others we may adopt from >> docs-team. > >Normally I would say "yay, consistency!" In this case, let's verify >that that name isn't going to have an undesirable effect when the >content is published. > >I know the default destination directory for the publish job is >taken from the repository name, which would mean we would have a >URL like docs.openstack.org/osops-docs. I don't know if there is a >way to override that, but the infra team will know. So, if you want >a URL like docs.o.o/operations-guide instead, you'll want to check >with the infra folks before creating the repo to make sure it's set >up in a way to get the URL you want. Names are hard! Thanks for pointing out the implications. -Jon -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From sbauza at redhat.com Mon May 28 12:31:59 2018 From: sbauza at redhat.com (Sylvain Bauza) Date: Mon, 28 May 2018 14:31:59 +0200 Subject: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI In-Reply-To: References: Message-ID: On Fri, May 25, 2018 at 12:19 AM, Matt Riedemann wrote: > I've written a nova-manage placement heal_allocations CLI [1] which was a > TODO from the PTG in Dublin as a step toward getting existing > CachingScheduler users to roll off that (which is deprecated). > > During the CERN cells v1 upgrade talk it was pointed out that CERN was > able to go from placement-per-cell to centralized placement in Ocata > because the nova-computes in each cell would automatically recreate the > allocations in Placement in a periodic task, but that code is gone once > you're upgraded to Pike or later. > > In various other talks during the summit this week, we've talked about > things during upgrades where, for instance, if placement is down for some > reason during an upgrade, a user deletes an instance and the allocation > doesn't get cleaned up from placement so it's going to continue counting > against resource usage on that compute node even though the server instance > in nova is gone. So this CLI could be expanded to help clean up situations > like that, e.g. provide it a specific server ID and the CLI can figure out > if it needs to clean things up in placement. > > So there are plenty of things we can build into this, but the patch is > already quite large. I expect we'll also be backporting this to stable > branches to help operators upgrade/fix allocation issues. It already has > several things listed in a code comment inline about things to build into > this later. > > My question is, is this good enough for a first iteration or is there > something severely missing before we can merge this, like the automatic > marker tracking mentioned in the code (that will probably be a non-trivial > amount of code to add). I could really use some operator feedback on this > to just take a look at what it already is capable of and if it's not going > to be useful in this iteration, let me know what's missing and I can add > that in to the patch. > > [1] https://review.openstack.org/#/c/565886/ > > It does sound for me a good way to help operators. That said, given I'm now working on using Nested Resource Providers for VGPU inventories, I wonder about a possible upgrade problem with VGPU allocations. Given that : - in Queens, VGPU inventories are for the root RP (ie. the compute node RP), but, - in Rocky, VGPU inventories will be for children RPs (ie. against a specific VGPU type), then if we have VGPU allocations in Queens, when upgrading to Rocky, we should maybe recreate the allocations to a specific other inventory ? Hope you see the problem with upgrading by creating nested RPs ? > -- > > Thanks, > > Matt > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zioproto at gmail.com Mon May 28 12:50:10 2018 From: zioproto at gmail.com (Saverio Proto) Date: Mon, 28 May 2018 14:50:10 +0200 Subject: [Openstack-operators] [openstack-dev][publiccloud-wg][k8s][octavia] OpenStack Load Balancer APIs and K8s In-Reply-To: References: Message-ID: Hello Chris, I finally had the time to write about my deployment: https://cloudblog.switch.ch/2018/05/22/openstack-horizon-runs-on-kubernetes-in-production-at-switch/ in this blog post I explain why I use the kubernetes nginx-ingress instead of Openstack LBaaS. Cheers, Saverio 2018-03-15 23:55 GMT+01:00 Chris Hoge : > Hi everyone, > > I wanted to notify you of a thread I started in openstack-dev about the state > of the OpenStack load balancer APIs and the difficulty in integrating them > with Kubernetes. This in part directly relates to current public and private > deployments, and any feedback you have would be appreciated. Especially > feedback on which version of the load balancer APIs you deploy, and if you > haven't moved on to Octavia, why. > > http://lists.openstack.org/pipermail/openstack-dev/2018-March/128399.html > > Thanks in advance, > Chris > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From pkovar at redhat.com Mon May 28 14:03:41 2018 From: pkovar at redhat.com (Petr Kovar) Date: Mon, 28 May 2018 16:03:41 +0200 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> Message-ID: <20180528160341.be386cd2a4562d2981f470fc@redhat.com> On Thu, 24 May 2018 07:19:29 -0700 "Jonathan D. Proulx" wrote: > My intention based on current understandign would be to create a git > repo called "osops-docs" as this fits current naming an thin initial > document we intend to put there and the others we may adopt from > docs-team. So, just to clarify, the current plan is for your group to take ownership of the following docs? https://github.com/openstack/openstack-manuals/tree/a1f1748478125ccd68d90a98ccc06c7ec359d3a0/doc/ops-guide https://github.com/openstack/openstack-manuals/tree/master/doc/arch-design https://github.com/openstack/openstack-manuals/tree/master/doc/ha-guide Note that there is also https://github.com/openstack/openstack-manuals/tree/master/doc/ha-guide-draft which you probably want to merge with the ha-guide going forward (or retire one or the other). As for naming the repo, this is really up to you, but it should be something clear and easily recognizable by your audience. I can help with moving some of the content around, but as Doug pointed out, a few points about actual publishing need to be clarified first with the infra team. > My understanding being they don't to have this type of > documentention due to much reduced team size and prefer it live with > subject matter experts. It that correct? If that's not correct I'm > not personally opposed to trying this under docs. We'll need to > maintain enough contributors and reviewers to make the work flow go in > either location and that's my understanding of the basic issue not > where it lives. If you want more reviewers involved, I'd recommended inviting the reviewers from the docs group. > This naming would also match other repos wich could be consolidated into an > "osops" repo to rule them all. That may make sense as I think there's > significant overlap in set of people who might contribute, but that > can be a parallel conversation. > > Doug looking at new project docs I think most of it is clear enough to > me. Since it's not code I can skip all th PyPi stuff yes? The repo > creation seems pretty clear and I can steal the CI stuff from similar > projects. Might be best to look into how https://github.com/openstack/security-doc is configured as that repo contains a number of separate documents, all managed by one group. > I'm a little unclear on the Storyboard bit I've not done > much contribution lately and haven't storyboarded. Is that relevant > (or at least relevent at first) for this use case? If it is I > probably have more questions. I'd suggest either having your own storyboard or launchpad project so that users can file bugs somewhere, and give you feedback. storyboard might be a better option since all OpenStack projects all likely to migrate to it from launchpad at some point or another. Cheers, pk From gael.therond at gmail.com Mon May 28 17:09:50 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Mon, 28 May 2018 19:09:50 +0200 Subject: [Openstack-operators] [openstack-dev][publiccloud-wg][k8s][octavia] OpenStack Load Balancer APIs and K8s In-Reply-To: References: Message-ID: Hi everyone, I’m currently deploying Octavia as our global LBaaS for a lot of various workload such as Kubernetes ingress LB. We use Queens and plan to upgrade to rocky as soon as it reach the stable release and we use the native Octavia APIv2 (Not a neutron redirect etc). What do you need to know? Le lun. 28 mai 2018 à 14:50, Saverio Proto a écrit : > Hello Chris, > > I finally had the time to write about my deployment: > > https://cloudblog.switch.ch/2018/05/22/openstack-horizon-runs-on-kubernetes-in-production-at-switch/ > > in this blog post I explain why I use the kubernetes nginx-ingress > instead of Openstack LBaaS. > > Cheers, > > Saverio > > > 2018-03-15 23:55 GMT+01:00 Chris Hoge : > > Hi everyone, > > > > I wanted to notify you of a thread I started in openstack-dev about the > state > > of the OpenStack load balancer APIs and the difficulty in integrating > them > > with Kubernetes. This in part directly relates to current public and > private > > deployments, and any feedback you have would be appreciated. Especially > > feedback on which version of the load balancer APIs you deploy, and if > you > > haven't moved on to Octavia, why. > > > > > http://lists.openstack.org/pipermail/openstack-dev/2018-March/128399.html > > > > > > Thanks in advance, > > Chris > > _______________________________________________ > > OpenStack-operators mailing list > > OpenStack-operators at lists.openstack.org > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zioproto at gmail.com Mon May 28 19:26:01 2018 From: zioproto at gmail.com (Saverio Proto) Date: Mon, 28 May 2018 21:26:01 +0200 Subject: [Openstack-operators] [openstack-dev][publiccloud-wg][k8s][octavia] OpenStack Load Balancer APIs and K8s In-Reply-To: References: Message-ID: Hello Flint, what version of Kubernetes are you deploying on top of Openstack ? are you using the external Openstack cloud controller ? I tested it an it works only if you have at least v.1.10.3 Look at this page: https://github.com/kubernetes/cloud-provider-openstack/tree/master/examples/loadbalancers Please test that you can make a SSL termination on the loadbalancer, describing it with Kubernetes yaml files. That is important for production operation. Test also if you have downtime when you have to renew SSL certificates. You will also want to check that traffic that hits your pods has the HTTP header X-Forwarded-For, or even better the IP packets you receive at the Pods have the source IP address of the original client. If needed test everything also with IPv6 I personally decided not to use Octavia, but to go for the Kubernetes ingress-nginx https://github.com/kubernetes/ingress-nginx The key idea is that instead of Openstack controlling the LoadBalancer having Octavia spinning up a VM running nginx, you have Kubernetes controlling the LoadBalancer, running a nginx-container. At the end you need a nginx to reverse proxy, you have to decided if this resource is managed by Openstack or Kubernetes. Keep in mind that if you go for a kubernetes ingress controller you can avoid using nginx. There is already an alternative ha-proxy implementation: https://www.haproxy.com/blog/haproxy_ingress_controller_for_kubernetes/ Cheers, Saverio 2018-05-28 19:09 GMT+02:00 Flint WALRUS : > Hi everyone, I’m currently deploying Octavia as our global LBaaS for a lot > of various workload such as Kubernetes ingress LB. > > We use Queens and plan to upgrade to rocky as soon as it reach the stable > release and we use the native Octavia APIv2 (Not a neutron redirect etc). > > What do you need to know? > > Le lun. 28 mai 2018 à 14:50, Saverio Proto a écrit : >> >> Hello Chris, >> >> I finally had the time to write about my deployment: >> >> https://cloudblog.switch.ch/2018/05/22/openstack-horizon-runs-on-kubernetes-in-production-at-switch/ >> >> in this blog post I explain why I use the kubernetes nginx-ingress >> instead of Openstack LBaaS. >> >> Cheers, >> >> Saverio >> >> >> 2018-03-15 23:55 GMT+01:00 Chris Hoge : >> > Hi everyone, >> > >> > I wanted to notify you of a thread I started in openstack-dev about the >> > state >> > of the OpenStack load balancer APIs and the difficulty in integrating >> > them >> > with Kubernetes. This in part directly relates to current public and >> > private >> > deployments, and any feedback you have would be appreciated. Especially >> > feedback on which version of the load balancer APIs you deploy, and if >> > you >> > haven't moved on to Octavia, why. >> > >> > >> > http://lists.openstack.org/pipermail/openstack-dev/2018-March/128399.html >> > >> > >> > Thanks in advance, >> > Chris >> > _______________________________________________ >> > OpenStack-operators mailing list >> > OpenStack-operators at lists.openstack.org >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators From gael.therond at gmail.com Mon May 28 19:35:22 2018 From: gael.therond at gmail.com (Flint WALRUS) Date: Mon, 28 May 2018 21:35:22 +0200 Subject: [Openstack-operators] [openstack-dev][publiccloud-wg][k8s][octavia] OpenStack Load Balancer APIs and K8s In-Reply-To: References: Message-ID: Using 1.10. Something, I’ll have to check tomorrow morning. I don’t want to use nginx or the provided haproxy as my Octavia LBaaS is a global service and because the less I rely on Kube the more I’m happy ;-) Le lun. 28 mai 2018 à 21:26, Saverio Proto a écrit : > Hello Flint, > > what version of Kubernetes are you deploying on top of Openstack ? > > are you using the external Openstack cloud controller ? I tested it an > it works only if you have at least v.1.10.3 > > Look at this page: > > https://github.com/kubernetes/cloud-provider-openstack/tree/master/examples/loadbalancers > > Please test that you can make a SSL termination on the loadbalancer, > describing it with Kubernetes yaml files. That is important for > production operation. Test also if you have downtime when you have to > renew SSL certificates. > > You will also want to check that traffic that hits your pods has the > HTTP header X-Forwarded-For, or even better the IP packets you receive > at the Pods have the source IP address of the original client. > > If needed test everything also with IPv6 > > I personally decided not to use Octavia, but to go for the Kubernetes > ingress-nginx > https://github.com/kubernetes/ingress-nginx > > The key idea is that instead of Openstack controlling the LoadBalancer > having Octavia spinning up a VM running nginx, you have Kubernetes > controlling the LoadBalancer, running a nginx-container. > At the end you need a nginx to reverse proxy, you have to decided if > this resource is managed by Openstack or Kubernetes. > > Keep in mind that if you go for a kubernetes ingress controller you > can avoid using nginx. There is already an alternative ha-proxy > implementation: > https://www.haproxy.com/blog/haproxy_ingress_controller_for_kubernetes/ > > Cheers, > > Saverio > > 2018-05-28 19:09 GMT+02:00 Flint WALRUS : > > Hi everyone, I’m currently deploying Octavia as our global LBaaS for a > lot > > of various workload such as Kubernetes ingress LB. > > > > We use Queens and plan to upgrade to rocky as soon as it reach the stable > > release and we use the native Octavia APIv2 (Not a neutron redirect etc). > > > > What do you need to know? > > > > Le lun. 28 mai 2018 à 14:50, Saverio Proto a écrit > : > >> > >> Hello Chris, > >> > >> I finally had the time to write about my deployment: > >> > >> > https://cloudblog.switch.ch/2018/05/22/openstack-horizon-runs-on-kubernetes-in-production-at-switch/ > >> > >> in this blog post I explain why I use the kubernetes nginx-ingress > >> instead of Openstack LBaaS. > >> > >> Cheers, > >> > >> Saverio > >> > >> > >> 2018-03-15 23:55 GMT+01:00 Chris Hoge : > >> > Hi everyone, > >> > > >> > I wanted to notify you of a thread I started in openstack-dev about > the > >> > state > >> > of the OpenStack load balancer APIs and the difficulty in integrating > >> > them > >> > with Kubernetes. This in part directly relates to current public and > >> > private > >> > deployments, and any feedback you have would be appreciated. > Especially > >> > feedback on which version of the load balancer APIs you deploy, and if > >> > you > >> > haven't moved on to Octavia, why. > >> > > >> > > >> > > http://lists.openstack.org/pipermail/openstack-dev/2018-March/128399.html > >> > < > http://lists.openstack.org/pipermail/openstack-dev/2018-March/128399.html> > >> > > >> > Thanks in advance, > >> > Chris > >> > _______________________________________________ > >> > OpenStack-operators mailing list > >> > OpenStack-operators at lists.openstack.org > >> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > >> > >> _______________________________________________ > >> OpenStack-operators mailing list > >> OpenStack-operators at lists.openstack.org > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Tue May 29 09:41:46 2018 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Tue, 29 May 2018 11:41:46 +0200 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance Message-ID: I have a small testbed OpenStack cloud (running Ocata) where I am trying to debug a problem with Nova scheduling. In short: I see different behaviors when I create a new VM and when I try to migrate a VM Since I want to partition the Cloud so that each project uses only certain compute nodes, I created one host aggregate per project (see also this thread: http://lists.openstack.org/pipermail/openstack-operators/2018-February/014831.html ) The host-aggregate for my project is: # nova aggregate-show 52 +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | UUID | +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ | 52 | SgaraPrj1 | nova | 'compute-01.cloud.pd.infn.it', ' compute-02.cloud.pd.infn.it' | 'availability_zone=nova', 'filter_tenant_id=ee1865a76440481cbcff08544c7d580a', 'size=normal' | 675f6291-6997-470d-87e1-e9ea199a379f | +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ The same compute nodes are shared by other projects (for which specific host-aggregates, as this one, have been created) The other compute node (I have only 3 compute nodes in this small testbed) is targeted to other projects (for which specific host-aggregates exist) This is what I have in nova.conf wrt scheduling filters: enabled_filters = AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCo reFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter If I try to create a VM, I see from the scheduler log [*] that the AggregateMultiTenancyIsolation selects only 2 compute nodes, as expected. But if I then try to migrate the very same VM, it reports that no valid host was found: # nova migrate afaf2a2d-7ff8-4e52-a89a-031ee079a9ba ERROR (BadRequest): No valid host was found. No valid host found for cold migrate (HTTP 400) (Request-ID: req-45b8afd5-9683-40a6-8416-295563e37e34) And according to the scheduler log the problem is with the AggregateMultiTenancyIsolation which returned 0 hosts (while I would have expected one): 2018-05-29 11:12:56.375 19428 INFO nova.scheduler.host_manager [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a714\ 38c4412e6e13 - - -\ ] Host filter ignoring hosts: compute-02.cloud.pd.infn.it 2018-05-29 11:12:56.375 19428 DEBUG nova.filters [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 -\ - -] Starting wit\ h 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:70 2018-05-29 11:12:56.376 19428 DEBUG nova.filters [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 -\ - -] Filter Aggre\ gateInstanceExtraSpecsFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:12:56.377 19428 DEBUG nova.scheduler.filters.aggregate_multitenancy_isolation [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c\ 2f2c8d12 56c3f5c04\ 7e74a78a71438c4412e6e13 - - -] (compute-01.cloud.pd.infn.it, compute-01.cloud.pd.infn.it) ram: 12797MB disk: 48128MB io_ops: 0 instances: 0 fails tenant id on\ aggregate host_pa\ sses /usr/lib/python2.7/site-packages/nova/scheduler/filters/aggregate_multitenancy_isolation.py:50 2018-05-29 11:12:56.378 19428 DEBUG nova.scheduler.filters.aggregate_multitenancy_isolation [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c\ 2f2c8d12 56c3f5c04\ 7e74a78a71438c4412e6e13 - - -] (compute-03.cloud.pd.infn.it, compute-03.cloud.pd.infn.it) ram: 8701MB disk: -4096MB io_ops: 0 instances: 0 fails tenant id on \ aggregate host_pas\ ses /usr/lib/python2.7/site-packages/nova/scheduler/filters/aggregate_multitenancy_isolation.py:50 2018-05-29 11:12:56.378 19428 INFO nova.filters [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 - \ - -] Filter Aggreg\ ateMultiTenancyIsolation returned 0 hosts I am confused ... Any hints ? Thanks, Massimo [*] 2018-05-29 11:09:54.328 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter AggregateInstanceExtraSpecsFilter returned 3 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.330 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter AggregateMultiTenancyIsolation returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.332 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter RetryFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.332 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter AvailabilityZoneFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.333 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter RamFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.334 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter CoreFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.334 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter AggregateRamFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.335 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter AggregateCoreFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.335 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter DiskFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.336 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter ComputeFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.337 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter ComputeCapabilitiesFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.338 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter ImagePropertiesFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.339 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter ServerGroupAntiAffinityFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 2018-05-29 11:09:54.339 19428 DEBUG nova.filters [req-1a838e77-8042-4550-b157-4943445119a2 ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ - -] Filter ServerGroupAffinityFilter returned 2 host(s) get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mihalis68 at gmail.com Tue May 29 11:14:03 2018 From: mihalis68 at gmail.com (Chris Morgan) Date: Tue, 29 May 2018 07:14:03 -0400 Subject: [Openstack-operators] Proposing no Ops Meetups team meeting this week Message-ID: Some of us will be only just returning to work today after being away all week last week for the (successful) OpenStack Summit, therefore I propose we skip having a meeting today but regroup next week? Chris -- Chris Morgan -------------- next part -------------- An HTML attachment was scrubbed... URL: From tomi.juvonen at nokia.com Tue May 29 11:14:12 2018 From: tomi.juvonen at nokia.com (Juvonen, Tomi (Nokia - FI/Espoo)) Date: Tue, 29 May 2018 11:14:12 +0000 Subject: [Openstack-operators] New OpenStack project for rolling maintenance and upgrade in interaction with application on top of it Message-ID: Hi, I am the PTL of the OPNFV Doctor project. I have been working for a couple of years figuring out the infrastructure maintenance in interaction with application on top of it. Looked into Nova, Craton and had several Ops sessions. Past half a year there has been couple of different POCs, the last in March in the ONS [1] [2] In OpenStack Vancouver summit last week it was time to present [3]. In Forum discussion following the presentation it was whether to make this just by utilizing different existing projects, but to make this generic, pluggable, easily adapted and future proof, it now goes down to start what I almost started a couple of years ago; the OpenStack Fenix project [4]. On behalf of OPNFV Doctor I would welcome any last thoughts before starting the project and would also love to see somebody joining to make the Fenix fly. Main use cases to list most of them: * As a cloud admin I want to maintain and upgrade my infrastructure in a rolling fashion. * As a cloud admin I want to have a pluggable workflow to maintain and upgrade my infrastructure, to ensure it can be done with complicated infrastructure components and in interaction with different application payloads on top of it. * As a infrastructure service, I need to know whether infrastructure unavailability is because of planned maintenance. * As a critical application owner, I want to be aware of any planned downtime effecting to my service. * As a critical application owner, I want to have interaction with infrastructure rolling maintenance workflow to have a time window to ensure zero down time for my service and to be able to decide to make admin actions like migration of my instance. * As an application owner, I need to know when admin action like migration is complete. * As an application owner, I want to know about new capabilities coming because of infrastructure maintenance or upgrade, so I can take it also into use by my application. This could be hardware capability or for example OpenStack upgrade. * As a critical application that needs to scale by varying load, I need to interactively know about infrastructure resources scaling up and down, so I can scale my application at the same and keeping zero downtime for my service * As a critical application, I want to have retirement of my service done in controlled fashion. [1] Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor on OCP Hardware video [2] Infrastructure Maintenance & Upgrade: Zero VNF Downtime with OPNFV Doctor on OCP Hardware slides [3] How to gain VNF zero down-time during Infrastructure Maintenance and Upgrade [4] Fenix project wiki [5] Doctor design guideline draft Best Regards, Tomi Juvonen -------------- next part -------------- An HTML attachment was scrubbed... URL: From sean.mcginnis at gmx.com Tue May 29 11:42:34 2018 From: sean.mcginnis at gmx.com (Sean McGinnis) Date: Tue, 29 May 2018 06:42:34 -0500 Subject: [Openstack-operators] Proposing no Ops Meetups team meeting this week In-Reply-To: References: Message-ID: <1de2dd4e-d41b-5a31-84db-4d9ebbaae3c2@gmx.com> On 05/29/2018 06:14 AM, Chris Morgan wrote: > Some of us will be only just returning to work today after being away > all week last week for the (successful) OpenStack Summit, therefore I > propose we skip having a meeting today but regroup next week? > > Chris Makes sense to me. I know I have a lot of catching up to do. From emccormick at cirrusseven.com Tue May 29 11:53:19 2018 From: emccormick at cirrusseven.com (Erik McCormick) Date: Tue, 29 May 2018 07:53:19 -0400 Subject: [Openstack-operators] Proposing no Ops Meetups team meeting this week In-Reply-To: References: Message-ID: On Tue, May 29, 2018, 7:15 AM Chris Morgan wrote: > Some of us will be only just returning to work today after being away all > week last week for the (successful) OpenStack Summit, therefore I propose > we skip having a meeting today but regroup next week? > +1 > Chris > > -- > Chris Morgan > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -Erik > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue May 29 14:02:58 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 29 May 2018 09:02:58 -0500 Subject: [Openstack-operators] [nova] Need some feedback on the proposed heal_allocations CLI In-Reply-To: References: Message-ID: <97fc2ff4-ef97-4c36-0f89-3ef8d9c874fb@gmail.com> On 5/28/2018 7:31 AM, Sylvain Bauza wrote: > That said, given I'm now working on using Nested Resource Providers for > VGPU inventories, I wonder about a possible upgrade problem with VGPU > allocations. Given that : >  - in Queens, VGPU inventories are for the root RP (ie. the compute > node RP), but, >  - in Rocky, VGPU inventories will be for children RPs (ie. against a > specific VGPU type), then > > if we have VGPU allocations in Queens, when upgrading to Rocky, we > should maybe recreate the allocations to a specific other inventory ? For how the heal_allocations CLI works today, if the instance has any allocations in placement, it skips that instance. So this scenario wouldn't be a problem. > > Hope you see the problem with upgrading by creating nested RPs ? Yes, the CLI doesn't attempt to have any knowledge about nested resource providers, it just takes the flavor embedded in the instance and creates allocations against the compute node provider using the flavor. It has no explicit knowledge about granular request groups or more advanced features like that. -- Thanks, Matt From openstack at medberry.net Tue May 29 14:27:02 2018 From: openstack at medberry.net (David Medberry) Date: Tue, 29 May 2018 08:27:02 -0600 Subject: [Openstack-operators] Proposing no Ops Meetups team meeting this week In-Reply-To: References: Message-ID: Good plan. I'm just getting on email now and hadn't even considered IRC yet. :^) On Tue, May 29, 2018 at 5:53 AM, Erik McCormick wrote: > > > On Tue, May 29, 2018, 7:15 AM Chris Morgan wrote: > >> Some of us will be only just returning to work today after being away all >> week last week for the (successful) OpenStack Summit, therefore I propose >> we skip having a meeting today but regroup next week? >> > > +1 > > >> Chris >> >> -- >> Chris Morgan >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > > -Erik > >> >> > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaypipes at gmail.com Tue May 29 16:10:00 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Tue, 29 May 2018 12:10:00 -0400 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: Message-ID: The hosts you are attempting to migrate *to* do not have the filter_tenant_id property set to the same tenant ID as the compute host 2 that originally hosted the instance. That is why you see this in the scheduler logs when evaluating the fitness of compute host 1 and compute host 3: "fails tenant id" Best, -jay On 05/29/2018 05:41 AM, Massimo Sgaravatto wrote: > I have a small testbed OpenStack cloud (running Ocata) where I am trying > to debug a problem with Nova scheduling. > > > In short: I see different behaviors when I create a new VM and when I > try to migrate a VM > > > Since I want to partition the Cloud so that each project uses only > certain compute nodes, I created one host aggregate per project (see > also this thread: > http://lists.openstack.org/pipermail/openstack-operators/2018-February/014831.html) > > > The host-aggregate for my project is: > > # nova  aggregate-show 52 > +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ > | Id | Name      | Availability Zone | Hosts >                             | Metadata >                                                    | UUID >                    | > +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ > | 52 | SgaraPrj1 | nova              | 'compute-01.cloud.pd.infn.it > ', 'compute-02.cloud.pd.infn.it > ' | 'availability_zone=nova', > 'filter_tenant_id=ee1865a76440481cbcff08544c7d580a', 'size=normal' | > 675f6291-6997-470d-87e1-e9ea199a379f | > +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ > > The same compute nodes are shared by other projects  (for which specific > host-aggregates, as this one, have been created) > The other compute node (I have only 3 compute nodes in this small > testbed) is targeted to other projects (for which specific > host-aggregates exist) > > > This is what I have in nova.conf wrt scheduling filters: > > enabled_filters = > AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCo > reFilter,DiskFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter > > > > If I try to create a VM, I see from the scheduler log [*] that > the AggregateMultiTenancyIsolation selects only 2 compute nodes, as > expected. > > > But if I then try to migrate the very same VM, it reports that no valid > host was found: > > # nova migrate afaf2a2d-7ff8-4e52-a89a-031ee079a9ba > ERROR (BadRequest): No valid host was found. No valid host found for > cold migrate (HTTP 400) (Request-ID: > req-45b8afd5-9683-40a6-8416-295563e37e34) > > > And according to the scheduler log the problem is with the > AggregateMultiTenancyIsolation which returned 0 hosts (while I would > have expected one): > > 2018-05-29 11:12:56.375 19428 INFO nova.scheduler.host_manager > [req-45b8afd5-9683-40a6-8416-295563e37e34 > 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a714\ > 38c4412e6e13 - - -\ > ] Host filter ignoring hosts: compute-02.cloud.pd.infn.it > > 2018-05-29 11:12:56.375 19428 DEBUG nova.filters > [req-45b8afd5-9683-40a6-8416-295563e37e34 > 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 -\ >  - -] Starting wit\ > h 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:70 > 2018-05-29 11:12:56.376 19428 DEBUG nova.filters > [req-45b8afd5-9683-40a6-8416-295563e37e34 > 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 -\ >  - -] Filter Aggre\ > gateInstanceExtraSpecsFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:12:56.377 19428 DEBUG > nova.scheduler.filters.aggregate_multitenancy_isolation > [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c\ > 2f2c8d12 56c3f5c04\ > 7e74a78a71438c4412e6e13 - - -] (compute-01.cloud.pd.infn.it > , compute-01.cloud.pd.infn.it > ) ram: 12797MB disk: 48128MB io_ops: > 0 instances: 0 fails tenant id on\ >  aggregate host_pa\ > sses > /usr/lib/python2.7/site-packages/nova/scheduler/filters/aggregate_multitenancy_isolation.py:50 > 2018-05-29 11:12:56.378 19428 DEBUG > nova.scheduler.filters.aggregate_multitenancy_isolation > [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c\ > 2f2c8d12 56c3f5c04\ > 7e74a78a71438c4412e6e13 - - -] (compute-03.cloud.pd.infn.it > , compute-03.cloud.pd.infn.it > ) ram: 8701MB disk: -4096MB io_ops: > 0 instances: 0 fails tenant id on \ > aggregate host_pas\ > ses > /usr/lib/python2.7/site-packages/nova/scheduler/filters/aggregate_multitenancy_isolation.py:50 > 2018-05-29 11:12:56.378 19428 INFO nova.filters > [req-45b8afd5-9683-40a6-8416-295563e37e34 > 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a71438c4412e6e13 - \ > - -] Filter Aggreg\ > ateMultiTenancyIsolation returned 0 hosts > > > > I am confused ... > Any hints ? > > Thanks, Massimo > > [*] > > > 2018-05-29 11:09:54.328 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter AggregateInstanceExtraSpecsFilter returned 3 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.330 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter AggregateMultiTenancyIsolation returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.332 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter RetryFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.332 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter AvailabilityZoneFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.333 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter RamFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.334 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter CoreFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.334 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter AggregateRamFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.335 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter AggregateCoreFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.335 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter DiskFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.336 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter ComputeFilter returned 2 host(s) get_filtered_objects > /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.337 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter ComputeCapabilitiesFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.338 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter ImagePropertiesFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.339 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter ServerGroupAntiAffinityFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > 2018-05-29 11:09:54.339 19428 DEBUG nova.filters > [req-1a838e77-8042-4550-b157-4943445119a2 > ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a -\ >  - -] Filter ServerGroupAffinityFilter returned 2 host(s) > get_filtered_objects /usr/lib/python2.7/site-packages/nova/filters.py:104 > > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > From mriedemos at gmail.com Tue May 29 17:06:16 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 29 May 2018 12:06:16 -0500 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: Message-ID: On 5/29/2018 11:10 AM, Jay Pipes wrote: > The hosts you are attempting to migrate *to* do not have the > filter_tenant_id property set to the same tenant ID as the compute host > 2 that originally hosted the instance. > > That is why you see this in the scheduler logs when evaluating the > fitness of compute host 1 and compute host 3: > > "fails tenant id" > > Best, > -jay Hmm, I'm not sure about that. This is the aggregate right? # nova aggregate-show 52 +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | UUID | +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ | 52 | SgaraPrj1 | nova | 'compute-01.cloud.pd.infn.it ', 'compute-02.cloud.pd.infn.it ' | 'availability_zone=nova', 'filter_tenant_id=ee1865a76440481cbcff08544c7d580a', 'size=normal' | 675f6291-6997-470d-87e1-e9ea199a379f | +----+-----------+-------------------+--------------------------------------------------------------+----------------------------------------------------------------------------------------------+--------------------------------------+ So compute-01 and compute-02 are in that aggregate for the same tenant ee1865a76440481cbcff08544c7d580a. From the logs, it skips compute-02 since the instance is already on that host. > 2018-05-29 11:12:56.375 19428 INFO nova.scheduler.host_manager [req-45b8afd5-9683-40a6-8416-295563e37e34 9bd03f63fa9d4beb8de31e6c2f2c8d12 56c3f5c047e74a78a714\ 38c4412e6e13 - - -\ ] Host filter ignoring hosts: compute-02.cloud.pd.infn.it So it processes compute-01 and compute-03. It should accept compute-01 since it's in the same tenant-specific aggregate and reject compute-03. But the filter rejects both hosts. It would be useful to know what the tenant_id is when comparing against the aggregate metadata: https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50 I'm wondering if the RequestSpec.project_id is null? Like, I wonder if you're hitting this bug: https://bugs.launchpad.net/nova/+bug/1739318 Although if this is a clean Ocata environment with new instances, you shouldn't have that problem. -- Thanks, Matt From doug at doughellmann.com Tue May 29 17:31:21 2018 From: doug at doughellmann.com (Doug Hellmann) Date: Tue, 29 May 2018 13:31:21 -0400 Subject: [Openstack-operators] Ops Community Documentation - first anchor point In-Reply-To: <20180528160341.be386cd2a4562d2981f470fc@redhat.com> References: <30d4f1a3668445a11fd34b271bc37e94@arcor.de> <20180524141929.2vylwguebcgkjxa3@csail.mit.edu> <20180528160341.be386cd2a4562d2981f470fc@redhat.com> Message-ID: <1527614644-sup-279@lrrr.local> Excerpts from Petr Kovar's message of 2018-05-28 16:03:41 +0200: > On Thu, 24 May 2018 07:19:29 -0700 > "Jonathan D. Proulx" wrote: > > > My intention based on current understandign would be to create a git > > repo called "osops-docs" as this fits current naming an thin initial > > document we intend to put there and the others we may adopt from > > docs-team. > > So, just to clarify, the current plan is for your group to take ownership > of the following docs? > > https://github.com/openstack/openstack-manuals/tree/a1f1748478125ccd68d90a98ccc06c7ec359d3a0/doc/ops-guide > https://github.com/openstack/openstack-manuals/tree/master/doc/arch-design > https://github.com/openstack/openstack-manuals/tree/master/doc/ha-guide Hmm, no, that's not what I thought we agreed to in the room. During the Pike cycle the Docs team indicated that it could no longer maintain the Operators Guide. That guide has *already* been handed off to new owners. They are changing from hosting it in the wiki to using a git repository. As part of that discussion, we talked about team ownership, and they indicated that they still wanted to be independent of the Documentation team. Those other repositories did come up, but without clear contributors I encouraged them to wait until they have the Operators Guide online before they try to take on any more work. At that point we can have the conversation about ownership. > > Note that there is also > https://github.com/openstack/openstack-manuals/tree/master/doc/ha-guide-draft > which you probably want to merge with the ha-guide going forward (or > retire one or the other). > > As for naming the repo, this is really up to you, but it should be > something clear and easily recognizable by your audience. > > I can help with moving some of the content around, but as Doug pointed out, > a few points about actual publishing need to be clarified first with the > infra team. The current plan is to create a SIG to own the repo so the owners can publish the results to docs.openstack.org somewhere. The exact URL is yet to be determined. > > > My understanding being they don't to have this type of > > documentention due to much reduced team size and prefer it live with > > subject matter experts. It that correct? If that's not correct I'm > > not personally opposed to trying this under docs. We'll need to > > maintain enough contributors and reviewers to make the work flow go in > > either location and that's my understanding of the basic issue not > > where it lives. > > If you want more reviewers involved, I'd recommended inviting the reviewers > from the docs group. Yes, it would be good to have reviews from the existing documentation team, especially any of them familiar with the content already and have the time to help. > > > This naming would also match other repos wich could be consolidated into an > > "osops" repo to rule them all. That may make sense as I think there's > > significant overlap in set of people who might contribute, but that > > can be a parallel conversation. > > > > Doug looking at new project docs I think most of it is clear enough to > > me. Since it's not code I can skip all th PyPi stuff yes? The repo > > creation seems pretty clear and I can steal the CI stuff from similar > > projects. > > Might be best to look into how https://github.com/openstack/security-doc is > configured as that repo contains a number of separate documents, all managed > by one group. That may be a good example. I still think we want 1 guide per repository, because it makes publishing much simpler. > > > I'm a little unclear on the Storyboard bit I've not done > > much contribution lately and haven't storyboarded. Is that relevant > > (or at least relevent at first) for this use case? If it is I > > probably have more questions. > > I'd suggest either having your own storyboard or launchpad project so that > users can file bugs somewhere, and give you feedback. storyboard might be a > better option since all OpenStack projects all likely to migrate to it from > launchpad at some point or another. Yes, please use storyboard for anything new. Doug > > Cheers, > pk > From jaypipes at gmail.com Tue May 29 17:44:51 2018 From: jaypipes at gmail.com (Jay Pipes) Date: Tue, 29 May 2018 13:44:51 -0400 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: Message-ID: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> On 05/29/2018 01:06 PM, Matt Riedemann wrote: > I'm wondering if the RequestSpec.project_id is null? Like, I wonder if > you're hitting this bug: > > https://bugs.launchpad.net/nova/+bug/1739318 > > Although if this is a clean Ocata environment with new instances, you > shouldn't have that problem. Looks very much like that bug, yes. Either that, or the wrong project_id is being used when attempting to migrate? Maybe the admin project_id is being used instead of the original project_id who launched the instance? There is only one way that "fails tenant ID" shows up in the logs, and it's when the project ID making the request isn't in the configured projects for the aggregate... https://github.com/openstack/nova/blob/master/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L49-L51 Best, -jay From mriedemos at gmail.com Tue May 29 19:47:30 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 29 May 2018 14:47:30 -0500 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> Message-ID: <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> On 5/29/2018 12:44 PM, Jay Pipes wrote: > Either that, or the wrong project_id is being used when attempting to > migrate? Maybe the admin project_id is being used instead of the > original project_id who launched the instance? Could be, but we should be pulling the request spec from the database which was created when the instance was created. There is some shim code from Newton which will create an essentially fake request spec on-demand when doing a move operation if the instance was created before newton, which could go back to that bug I was referring to. Massimo - can you clarify if this is a new server created in your Ocata test environment that you're trying to move? Or is this a server created before Ocata? -- Thanks, Matt From massimo.sgaravatto at gmail.com Tue May 29 20:07:39 2018 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Tue, 29 May 2018 22:07:39 +0200 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> Message-ID: The VM that I am trying to migrate was created when the Cloud was already running Ocata Cheers, Massimo On Tue, May 29, 2018 at 9:47 PM, Matt Riedemann wrote: > On 5/29/2018 12:44 PM, Jay Pipes wrote: > >> Either that, or the wrong project_id is being used when attempting to >> migrate? Maybe the admin project_id is being used instead of the original >> project_id who launched the instance? >> > > Could be, but we should be pulling the request spec from the database > which was created when the instance was created. There is some shim code > from Newton which will create an essentially fake request spec on-demand > when doing a move operation if the instance was created before newton, > which could go back to that bug I was referring to. > > Massimo - can you clarify if this is a new server created in your Ocata > test environment that you're trying to move? Or is this a server created > before Ocata? > > -- > > Thanks, > > Matt > > > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Tue May 29 23:01:24 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Tue, 29 May 2018 18:01:24 -0500 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> Message-ID: <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> On 5/29/2018 3:07 PM, Massimo Sgaravatto wrote: > The VM that I am trying to migrate was created when the Cloud was > already running Ocata OK, I'd added the tenant_id variable in scope to the log message here: https://github.com/openstack/nova/blob/stable/ocata/nova/scheduler/filters/aggregate_multitenancy_isolation.py#L50 And make sure when it fails, it matches what you'd expect. If it's None or '' or something weird then we have a bug. -- Thanks, Matt From bitskrieg at bitskrieg.net Wed May 30 01:23:50 2018 From: bitskrieg at bitskrieg.net (Chris Apsey) Date: Tue, 29 May 2018 21:23:50 -0400 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> Message-ID: <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> I want to echo the effectiveness of this change - we had vif failures when launching more than 50 or so cirros instances simultaneously, but moving to daemon mode made this issue disappear and we've tested 5x that amount. This has been the single biggest scalability improvement to date. This option should be the default in the official docs. On May 24, 2018 05:55:49 Saverio Proto wrote: > Glad to hear it! > Always monitor rabbitmq queues to identify bottlenecks !! :) > > Cheers > > Saverio > > > Il gio 24 mag 2018, 11:07 Radu Popescu | eMAG, Technology > ha scritto: > Hi, > > did the change yesterday. Had no issue this morning with neutron not being > able to move fast enough. Still, we had some storage issues, but that's > another thing. > Anyway, I'll leave it like this for the next few days and report back in > case I get the same slow neutron errors. > > Thanks a lot! > Radu > > On Wed, 2018-05-23 at 10:08 +0000, Radu Popescu | eMAG, Technology wrote: >> Hi, >> >> actually, I didn't know about that option. I'll enable it right now. >> Testing is done every morning at about 4:00AM ..so I'll know tomorrow >> morning if it changed anything. >> >> Thanks, >> Radu >> >> On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote: >>> Sorry email went out incomplete. >>> Read this: >>> https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ >>> >>> make sure that Openstack rootwrap configured to work in daemon mode >>> >>> Thank you >>> >>> Saverio >>> >>> >>> 2018-05-22 15:29 GMT+02:00 Saverio Proto : >>>> >>> >>> Hello Radu, >>> >>> do you have the Openstack rootwrap configured to work in daemon mode ? >>> >>> please read this article: >>> >>> 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology >>> : >>>> >>> >>> Hi, >>> >>> so, nova says the VM is ACTIVE and actually boots with no network. We are >>> setting some metadata that we use later on and have cloud-init for different >>> tasks. >>> So, VM is up, OS is running, but network is working after a random amount of >>> time, that can get to around 45 minutes. Thing is, is not happening to all >>> VMs in that test (around 300), but it's happening to a fair amount - around >>> 25%. >>> >>> I can see the callback coming few seconds after neutron openvswitch agent >>> says it's completed the setup. My question is, why is it taking so long for >>> nova openvswitch agent to configure the port? I can see the port up in both >>> host OS and openvswitch. I would assume it's doing the whole namespace and >>> iptables setup. But still, 30 minutes? Seems a lot! >>> >>> Thanks, >>> Radu >>> >>> On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: >>> >>> We have other scheduled tests that perform end-to-end (assign floating IP, >>> ssh, ping outside) and never had an issue. >>> I think we turned it off because the callback code was initially buggy and >>> nova would wait forever while things were in fact ok, but I'll change >>> "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run >>> another large test, just to confirm. >>> >>> We usually run these large tests after a version upgrade to test the APIs >>> under load. >>> >>> >>> >>> On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann >>> wrote: >>> >>> On 5/17/2018 9:46 AM, George Mihaiescu wrote: >>> >>> and large rally tests of 500 instances complete with no issues. >>> >>> >>> Sure, except you can't ssh into the guests. >>> >>> The whole reason the vif plugging is fatal and timeout and callback code was >>> because the upstream CI was unstable without it. The server would report as >>> ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE >>> guest that you can't actually do anything with is kind of pointless. >>> >>> _______________________________________________ >>> >>> OpenStack-operators mailing list >>> >>> OpenStack-operators at lists.openstack.org >>> >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >>> >>> >>> >>> _______________________________________________ >>> OpenStack-operators mailing list >>> OpenStack-operators at lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators >> _______________________________________________ >> OpenStack-operators mailing list >> OpenStack-operators at lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > _______________________________________________ > OpenStack-operators mailing list > OpenStack-operators at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From radu.popescu at emag.ro Wed May 30 08:02:30 2018 From: radu.popescu at emag.ro (Radu Popescu | eMAG, Technology) Date: Wed, 30 May 2018 08:02:30 +0000 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: <46be77c4a7b362245a729f39af058cc93ee8d769.camel@emag.ro> Hi, just to let you know. Problem is now gone. Instances boot up with working network interface. Thanks a lot, Radu On Tue, 2018-05-29 at 21:23 -0400, Chris Apsey wrote: I want to echo the effectiveness of this change - we had vif failures when launching more than 50 or so cirros instances simultaneously, but moving to daemon mode made this issue disappear and we've tested 5x that amount. This has been the single biggest scalability improvement to date. This option should be the default in the official docs. On May 24, 2018 05:55:49 Saverio Proto wrote: Glad to hear it! Always monitor rabbitmq queues to identify bottlenecks !! :) Cheers Saverio Il gio 24 mag 2018, 11:07 Radu Popescu | eMAG, Technology > ha scritto: Hi, did the change yesterday. Had no issue this morning with neutron not being able to move fast enough. Still, we had some storage issues, but that's another thing. Anyway, I'll leave it like this for the next few days and report back in case I get the same slow neutron errors. Thanks a lot! Radu On Wed, 2018-05-23 at 10:08 +0000, Radu Popescu | eMAG, Technology wrote: Hi, actually, I didn't know about that option. I'll enable it right now. Testing is done every morning at about 4:00AM ..so I'll know tomorrow morning if it changed anything. Thanks, Radu On Tue, 2018-05-22 at 15:30 +0200, Saverio Proto wrote: Sorry email went out incomplete. Read this: https://cloudblog.switch.ch/2017/08/28/starting-1000-instances-on-switchengines/ make sure that Openstack rootwrap configured to work in daemon mode Thank you Saverio 2018-05-22 15:29 GMT+02:00 Saverio Proto >: Hello Radu, do you have the Openstack rootwrap configured to work in daemon mode ? please read this article: 2018-05-18 10:21 GMT+02:00 Radu Popescu | eMAG, Technology >: Hi, so, nova says the VM is ACTIVE and actually boots with no network. We are setting some metadata that we use later on and have cloud-init for different tasks. So, VM is up, OS is running, but network is working after a random amount of time, that can get to around 45 minutes. Thing is, is not happening to all VMs in that test (around 300), but it's happening to a fair amount - around 25%. I can see the callback coming few seconds after neutron openvswitch agent says it's completed the setup. My question is, why is it taking so long for nova openvswitch agent to configure the port? I can see the port up in both host OS and openvswitch. I would assume it's doing the whole namespace and iptables setup. But still, 30 minutes? Seems a lot! Thanks, Radu On Thu, 2018-05-17 at 11:50 -0400, George Mihaiescu wrote: We have other scheduled tests that perform end-to-end (assign floating IP, ssh, ping outside) and never had an issue. I think we turned it off because the callback code was initially buggy and nova would wait forever while things were in fact ok, but I'll change "vif_plugging_is_fatal = True" and "vif_plugging_timeout = 300" and run another large test, just to confirm. We usually run these large tests after a version upgrade to test the APIs under load. On Thu, May 17, 2018 at 11:42 AM, Matt Riedemann > wrote: On 5/17/2018 9:46 AM, George Mihaiescu wrote: and large rally tests of 500 instances complete with no issues. Sure, except you can't ssh into the guests. The whole reason the vif plugging is fatal and timeout and callback code was because the upstream CI was unstable without it. The server would report as ACTIVE but the ports weren't wired up so ssh would fail. Having an ACTIVE guest that you can't actually do anything with is kind of pointless. _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators _______________________________________________ OpenStack-operators mailing list OpenStack-operators at lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators -------------- next part -------------- An HTML attachment was scrubbed... URL: From massimo.sgaravatto at gmail.com Wed May 30 10:21:32 2018 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Wed, 30 May 2018 12:21:32 +0200 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> Message-ID: The problem is indeed with the tenant_id When I create a VM, tenant_id is ee1865a76440481cbcff08544c7d580a (SgaraPrj1), as expected But when, as admin, I run the "nova migrate" command to migrate the very same instance, the tenant_id is 56c3f5c047e74a78a71438c4412e6e13 (admin) ! Cheers, Massimo On Wed, May 30, 2018 at 1:01 AM, Matt Riedemann wrote: > On 5/29/2018 3:07 PM, Massimo Sgaravatto wrote: > >> The VM that I am trying to migrate was created when the Cloud was already >> running Ocata >> > > OK, I'd added the tenant_id variable in scope to the log message here: > > https://github.com/openstack/nova/blob/stable/ocata/nova/sch > eduler/filters/aggregate_multitenancy_isolation.py#L50 > > And make sure when it fails, it matches what you'd expect. If it's None or > '' or something weird then we have a bug. > > -- > > Thanks, > > Matt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Wed May 30 14:30:51 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 30 May 2018 09:30:51 -0500 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: On 5/29/2018 8:23 PM, Chris Apsey wrote: > I want to echo the effectiveness of this change - we had vif failures > when launching more than 50 or so cirros instances simultaneously, but > moving to daemon mode made this issue disappear and we've tested 5x that > amount.  This has been the single biggest scalability improvement to > date.  This option should be the default in the official docs. This is really good feedback. I'm not sure if there is any kind of centralized performance/scale-related documentation, does the LCOO team [1] have something that's current? There are also the performance docs [2] but that looks pretty stale. We could add a note to the neutron rootwrap configuration option such that if you're running into timeout issues you could consider running that in daemon mode, but it's probably not very discoverable. In fact, I couldn't find anything about it in the neutron docs, I only found this [3] because I know it's defined in oslo.rootwrap (I don't expect everyone to know where this is defined). I found root_helper_daemon in the neutron docs [4] but it doesn't mention anything about performance or related options, and it just makes it sound like it matters for xenserver, which I'd gloss over if I were using libvirt. The root_helper_daemon config option help in neutron should probably refer to the neutron-rootwrap-daemon which is in the setup.cfg [5]. For better discoverability of this, probably the best place to mention it is in the nova vif_plugging_timeout configuration option, since I expect that's the first place operators will be looking when they start hitting timeouts during vif plugging at scale. I can start pushing some docs patches and report back here for review help. [1] https://wiki.openstack.org/wiki/LCOO [2] https://docs.openstack.org/developer/performance-docs/ [3] https://docs.openstack.org/oslo.rootwrap/latest/user/usage.html#daemon-mode [4] https://docs.openstack.org/neutron/latest/configuration/neutron.html#agent.root_helper_daemon [5] https://github.com/openstack/neutron/blob/f486f0/setup.cfg#L54 -- Thanks, Matt From mriedemos at gmail.com Wed May 30 14:41:32 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 30 May 2018 09:41:32 -0500 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> Message-ID: On 5/30/2018 5:21 AM, Massimo Sgaravatto wrote: > The problem is indeed with the tenant_id > > When I create a VM, tenant_id is ee1865a76440481cbcff08544c7d580a > (SgaraPrj1), as expected > > But when, as admin, I run the "nova migrate" command to migrate the very > same instance, the tenant_id is 56c3f5c047e74a78a71438c4412e6e13 (admin) ! OK that's good information. Tracing the code for cold migrate in ocata, we get the request spec that was created when the instance was created here: https://github.com/openstack/nova/blob/stable/ocata/nova/compute/api.py#L3339 As I mentioned earlier, if it was cold migrating an instance created before Newton and the online data migration wasn't run on it, we'd create a temporary request spec here: https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/manager.py#L263 But that shouldn't be the case in your scenario. Right before we call the scheduler, for some reason, we completely ignore the request spec retrieved in the API, and re-create it from local scope variables in conductor: https://github.com/openstack/nova/blob/stable/ocata/nova/conductor/tasks/migrate.py#L50 And *that* is precisely where this breaks down and takes the project_id from the current context (admin) rather than the instance: https://github.com/openstack/nova/blob/stable/ocata/nova/objects/request_spec.py#L407 Thanks for your patience in debugging this Massimo! I'll get a bug reported and patch posted to fix it. -- Thanks, Matt From mriedemos at gmail.com Wed May 30 18:06:21 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Wed, 30 May 2018 13:06:21 -0500 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> Message-ID: On 5/30/2018 9:41 AM, Matt Riedemann wrote: > Thanks for your patience in debugging this Massimo! I'll get a bug > reported and patch posted to fix it. I'm tracking the problem with this bug: https://bugs.launchpad.net/nova/+bug/1774205 I found that this has actually been fixed since Pike: https://review.openstack.org/#/c/449640/ But I've got a patch up for another related issue, and a functional test to avoid regressions which I can also use when backporting the fix to stable/ocata. -- Thanks, Matt From massimo.sgaravatto at gmail.com Thu May 31 06:34:42 2018 From: massimo.sgaravatto at gmail.com (Massimo Sgaravatto) Date: Thu, 31 May 2018 08:34:42 +0200 Subject: [Openstack-operators] Problems with AggregateMultiTenancyIsolation while migrating an instance In-Reply-To: References: <83e68473-1c74-2a3c-2b94-2e6c8967ec90@gmail.com> <416c49dc-1dfb-2432-ad76-be9ce71ee352@gmail.com> <4f3531e3-e3bc-c5a6-5a5c-9621dd799127@gmail.com> Message-ID: Thanks a lot !! On Wed, May 30, 2018 at 8:06 PM, Matt Riedemann wrote: > On 5/30/2018 9:41 AM, Matt Riedemann wrote: > >> Thanks for your patience in debugging this Massimo! I'll get a bug >> reported and patch posted to fix it. >> > > I'm tracking the problem with this bug: > > https://bugs.launchpad.net/nova/+bug/1774205 > > I found that this has actually been fixed since Pike: > > https://review.openstack.org/#/c/449640/ > > But I've got a patch up for another related issue, and a functional test > to avoid regressions which I can also use when backporting the fix to > stable/ocata. > > -- > > Thanks, > > Matt > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mrhillsman at gmail.com Thu May 31 18:03:20 2018 From: mrhillsman at gmail.com (Melvin Hillsman) Date: Thu, 31 May 2018 13:03:20 -0500 Subject: [Openstack-operators] OpenLab Cross-community Impact Message-ID: Hi everyone, I know we have sent out quite a bit of information over the past few days with the OpenStack Summit and other updates recently. Additionally there are plenty of meetings we all attend. I just want to take time to point to something very significant in my opinion and again give big thanks to Chris, Dims, Liusheng, Chenrui, Zhuli, Joe (gophercloud), and anyone else contributing to OpenLab. A member of the release team working on the testing infrastructure for Kubernetes did a shoutout to the team for the following: (AishSundar) Shoutout to @dims and OpenStack team for quickly getting their 1.11 Conformance results piped to CI runs and contributing results to Conformance dashboard ! https://k8s-testgrid.appspot.com/sig-release-1.11-all#Conformance%20-%20OpenStack&show-stale-tests= Here is why this is significant and those working on this who I previously mentioned should get recognition: (hogepodge) OpenStack and GCE are the first two clouds that will release block on conformance testing failures. Thanks @dims for building out the test pipeline and @mrhillsman for leading the OpenLab efforts that are reporting back to the test grid. @RuiChen for his contributions to the testing effort. Amazing work for the last six months. In other words, if the external cloud provider ci conformance tests we do in OpenLab are not passing, it will be one of the signals used for blocking the release. OpenStack and GCE are the first two clouds to achieve this and it is a significant accomplishment for the OpenLab team and the OpenStack community overall regarding our relationship with the Kubernetes community. Thanks again Chris, Dims, Joe, Liusheng, Chenrui, and Zhuli for the work you have done and continue to do in this space. Personally I hope we take a moment to really consider this milestone and work to ensure OpenLab's continued success as we embark on working on other integrations. We started OpenLab hoping we could make substantial impact specifically for the ecosystem that builds on top of OpenStack and this is evidence we can and should do more. -- Kind regards, Melvin Hillsman mrhillsman at gmail.com mobile: (832) 264-2646 -------------- next part -------------- An HTML attachment was scrubbed... URL: From melwittt at gmail.com Thu May 31 18:35:43 2018 From: melwittt at gmail.com (melanie witt) Date: Thu, 31 May 2018 11:35:43 -0700 Subject: [Openstack-operators] [nova] proposal to postpone nova-network core functionality removal to Stein Message-ID: <29873b6f-8a3c-ae6e-0756-c90d2c52a306@gmail.com> Hello Operators and Devs, This cycle at the PTG, we had decided to start making some progress toward removing nova-network [1] (thanks to those who have helped!) and so far, we've landed some patches to extract common network utilities from nova-network core functionality into separate utility modules. And we've started proposing removal of nova-network REST APIs [2]. At the cells v2 sync with operators forum session at the summit [3], we learned that CERN is in the middle of migrating from nova-network to neutron and that holding off on removal of nova-network core functionality until Stein would help them out a lot to have a safety net as they continue progressing through the migration. If we recall correctly, they did say that removal of the nova-network REST APIs would not impact their migration and Surya Seetharaman is double-checking about that and will get back to us. If so, we were thinking we can go ahead and work on nova-network REST API removals this cycle to make some progress while holding off on removing the core functionality of nova-network until Stein. I wanted to send this to the ML to let everyone know what we were thinking about this and to receive any additional feedback folks might have about this plan. Thanks, -melanie [1] https://etherpad.openstack.org/p/nova-ptg-rocky L301 [2] https://review.openstack.org/567682 [3] https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators L30 From glavado at whitestack.com Thu May 31 19:13:05 2018 From: glavado at whitestack.com (Gianpietro Lavado) Date: Thu, 31 May 2018 21:13:05 +0200 Subject: [Openstack-operators] [publiccloud] Feedback requested on use cases Message-ID: Hi team, As mentioned briefly during Vancouver's forum meeting, we are working in deploying OpenStack at a new public cloud operator, who found some gaps compared to their current platform (OVirt) We agreed to fix them and take the opportunity to contribute upstream if applicable. Since my experience is more related to private cloud space, I'm not sure if these cases are already there or if it's worth to open them for discussion in Launchpad. So if possible, I would like to receive a little bit of feedback before documenting them at a detailed level. I have a longer list, but here the three main ones: 1. *Instance HA:* being able to mark some VMs with priority levels so when a compute node goes down, they are automatically rebuilt at some other node. 2. *Compute node rebalancing: *rely on live migrations to periodically and automatically rebalance the capacity of compute nodes. 3. *VM Agent*: something like the oVirt Guest Agent , to be able to enhance interaction and get richer information from the VMs Any feedback is welcomed. Thanks! Gianpietro -------------- next part -------------- An HTML attachment was scrubbed... URL: From mriedemos at gmail.com Thu May 31 20:06:05 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 31 May 2018 15:06:05 -0500 Subject: [Openstack-operators] [openstack-dev] [nova] proposal to postpone nova-network core functionality removal to Stein In-Reply-To: <1391ee64-90f7-9414-9168-3a4caf495555@gmail.com> References: <29873b6f-8a3c-ae6e-0756-c90d2c52a306@gmail.com> <1391ee64-90f7-9414-9168-3a4caf495555@gmail.com> Message-ID: +openstack-operators On 5/31/2018 3:04 PM, Matt Riedemann wrote: > On 5/31/2018 1:35 PM, melanie witt wrote: >> >> This cycle at the PTG, we had decided to start making some progress >> toward removing nova-network [1] (thanks to those who have helped!) >> and so far, we've landed some patches to extract common network >> utilities from nova-network core functionality into separate utility >> modules. And we've started proposing removal of nova-network REST APIs >> [2]. >> >> At the cells v2 sync with operators forum session at the summit [3], >> we learned that CERN is in the middle of migrating from nova-network >> to neutron and that holding off on removal of nova-network core >> functionality until Stein would help them out a lot to have a safety >> net as they continue progressing through the migration. >> >> If we recall correctly, they did say that removal of the nova-network >> REST APIs would not impact their migration and Surya Seetharaman is >> double-checking about that and will get back to us. If so, we were >> thinking we can go ahead and work on nova-network REST API removals >> this cycle to make some progress while holding off on removing the >> core functionality of nova-network until Stein. >> >> I wanted to send this to the ML to let everyone know what we were >> thinking about this and to receive any additional feedback folks might >> have about this plan. >> >> Thanks, >> -melanie >> >> [1] https://etherpad.openstack.org/p/nova-ptg-rocky L301 >> [2] https://review.openstack.org/567682 >> [3] >> https://etherpad.openstack.org/p/YVR18-cellsv2-migration-sync-with-operators >> L30 > > As a reminder, this is the etherpad I started to document the nova-net > specific compute REST APIs which are candidates for removal: > > https://etherpad.openstack.org/p/nova-network-removal-rocky > -- Thanks, Matt From mriedemos at gmail.com Thu May 31 21:46:54 2018 From: mriedemos at gmail.com (Matt Riedemann) Date: Thu, 31 May 2018 16:46:54 -0500 Subject: [Openstack-operators] attaching network cards to VMs taking a very long time In-Reply-To: References: <715adc7d-64f6-9545-1bf6-5eb13fb1d991@gmail.com> <350f070b9d654a0a5430fafb07bcc1d41c98d2f8.camel@emag.ro> <43cd8579c761a13dcc81e6ffc9a69089fb421cda.camel@emag.ro> <6ec89b3767f26b73451fa490d18ed7756266d3ec.camel@emag.ro> <163aea4dc70.2784.5f0d7f2baa7831a2bbe6450f254d9a24@bitskrieg.net> Message-ID: <914e8aec-2a25-e7d0-270f-725b0aeba9d5@gmail.com> On 5/30/2018 9:30 AM, Matt Riedemann wrote: > > I can start pushing some docs patches and report back here for review help. Here are the docs patches in both nova and neutron: https://review.openstack.org/#/q/topic:bug/1774217+(status:open+OR+status:merged) -- Thanks, Matt