[Openstack-operators] [HA] RFC: user story including hypervisor reservation / host maintenance / storage AZs / event history (fwd)
Juvonen, Tomi (Nokia - FI/Espoo)
tomi.juvonen at nokia.com
Wed Jun 29 09:43:59 UTC 2016
Thanks again for help and comments Adam.
I need to look those other discussions you have linked here. Will take some
time as going on a holiday on Friday and coming back in august.
Meanwhile begin to think that having this new field in Nova would really be
just for maintenance and maybe no need for the URL to something external. Any
external tool could anyhow consume the notification and further logic could
be inside the tool. Downside is as Nova team anyhow did not want a big change
for this, "just one new field", it is not that usable for different maintenance
state information. Some different "states" one might need:
- Maintenance window (begin time - end time: if end time missing, the HW is not
coming back. This is needed if VM would be left on host during maintenance)
- In maintenance (visible to VMs left on host)
- Test (only operator can use this host after maintenance to test it works.
Needs new "MaintenanceModeFilter" for this purpose)
Ok, looking these "3 states", 2 could be reserved words that one can expect:
- In maintenance
- Test
For normal running situation we would know that there is no value, but for
"maintenance window" it could be tricky. Also would one want to tell more
details about this, meaning it would be behind some URL. Then one might need
to know difference between not maintained and maintained system. To launch
VM to maintained or not maintained system. As a some kind of state that
might be ugly as running versioning number and not convenient if again some
"MaintenanceModeFilter" would need to map to that.
Need to continue to find the best solution. Discuss also with nova guys and
in review when back from holiday.
Br,
Tomi
> -----Original Message-----
> From: Adam Spiers [mailto:aspiers at suse.com]
> Sent: Tuesday, June 28, 2016 6:42 PM
> To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvonen at nokia.com>
> Cc: openstack-operators mailing list <openstack-
> operators at lists.openstack.org>
> Subject: Re: [Openstack-operators] [HA] RFC: user story including
> hypervisor reservation / host maintenance / storage AZs / event history
> (fwd)
>
> Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvonen at nokia.com> wrote:
> > Thank you very much from the interest. Need to look over other
> > discussion and perhaps have a session in Barcelona to look the
> > way forward after change in Nova.
>
> Indeed, sounds good!
>
> > > -----Original Message-----
> > > From: Adam Spiers [mailto:aspiers at suse.com]
> > > Sent: Monday, June 20, 2016 4:43 PM
> > > To: Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvonen at nokia.com>
> > > Cc: openstack-operators mailing list <openstack-
> > > operators at lists.openstack.org>
> > > Subject: Re: [Openstack-operators] [HA] RFC: user story including
> > > hypervisor reservation / host maintenance / storage AZs / event history
> > > (fwd)
> > >
> > > Hi Tomi,
> > >
> > > Juvonen, Tomi (Nokia - FI/Espoo) <tomi.juvonen at nokia.com> wrote:
> > > > I'm working in the OPNFV Doctor project that is about fault
> > > > management and maintenance (NFV). The goal of the project is to
> > > > build fault management and maintenance framework for high
> > > > availability of Network Services on top of virtualized
> > > > infrastructure.
> > > >
> > > > https://wiki.opnfv.org/display/doctor
> > > >
> > > > Currently there is already landed effort to OpenStack to have
> > > > ability to detect failures fast, change states in OpenStack (Nova),
> > > > add state information that was missing and also to expose that to
> > > > owner of a VM. Also alarm is triggered. By all this one can now rely
> > > > the states and get notice about faults in a split second. Surely
> > > > with system configured monitor different faults and make actions
> > > > based configured policies, or leave some actions for consumers of
> > > > the alarms risen.
> > >
> > > Sounds very interesting - thanks. Does this really have to be limited
> > > to OPNFV though? It sounds like it would be very useful within
> > > OpenStack generally.
> > Surely not just for OPNFV, but for all operators.
>
> Right - so why is it part of the OPNFV project? That gives the
> impression that it would only be usable in NFV contexts.
>
> > If playing with the idea
> > of having link to some external tool to have more than
> > "host_maintenance_reason", like it now would seem some more generic
> > "host_details", where one could have external REST API to call to have
> any
> > wanted host specific details that one would like to expose also to
> > tenant/owner of server.
>
> Sounds like you are talking about some kind of "whiteboard" feature
> per instance which would act as a sort of communication channel
> between the project user/owner and the cloud operator? Can you
> describe a use case which is unrelated to maintenance?
>
> > If having that tool it could also have maintenance
> > or host failure specific scenarios implemented. Could have admin to do
> > things manually, or configure tool VNF / instance specifically to do some
> > actions..
>
> I think we should distinguish between a place to store freeform
> human-readable text, and a way for the cloud operator to plan and then
> carry out maintenance actions in a manner which would be communicated
> to affected users. The latter would require structured
> machine-readable values, otherwise it would be impossible to reliably
> implement well-defined workflows.
>
> If we implement a new freeform text field and then it gets treated as
> machine-readable by external tools, then there will be no consistency
> across different clouds, which will make it hard for operators to
> share those tools without conflicting with other uses.
>
> > OPNFV use case here is just the more specific maintenance state
> > to begin with, but who knows what one might want to implement there at
> the
> > end. Auto evacuate... ?
>
> Please be careful of the word evacuate, because it is ambiguous, as I
> explained in my Austin talk:
>
> https://youtu.be/lddtWUP_IKQ?t=13m8s
>
> > That is anyhow far in next steps as of complex to
> > build. It is even case specific, what to do in different scenarios:
> > - Manually do any action by admin.
> > - Automatically move VM (maybe not if problem with bigger scale)
> > - Let it stay on host over maintenance (not busy hour for service)
> > - Let VM owner remove/add VM (to host already gone through maintenance)
> > ...
>
> Yes, these are all possible scenarios. It depends very much on the
> kind of maintenance. The HA community talked about this topic a lot
> in Austin, and agreed that any solution supporting automatic workflows
> should be configurable so that each cloud operator can configure their
> cloud to behave in the way which makes the most sense for them. Our
> discussion was captured in this etherpad, although it might be
> slightly difficult to wade through for people who did not attend the
> meetings:
>
> https://etherpad.openstack.org/p/newton-instance-ha
>
> > > > For maintenance I had a session in Austin to talk with Ops and Nova
> > > > core about the maintenance part. There it was seen that Nova didn't
> > > > want more specific information about host maintenance (maintenance
> > > > state, maintenance window...), so as a result of the discussion
> > > > there is a spec that was now transferred to Ocata:
> > > >
> > > > https://review.openstack.org/310510/
> > >
> > > That's great - thanks a lot for highlighting, as it certainly seems to
> > > overlap a lot with the functionality which NTT proposed and is now
> > > described here:
> > >
> > > http://specs.openstack.org/openstack/openstack-user-stories/user-
> > > stories/proposed/ha_vm.html
> >
> > Thanks, need to familiarize into this as well as other requests in the
> > field.
>
> The talk which I mentioned above might help you get familiar with this
> area:
>
> https://www.openstack.org/videos/video/high-availability-for-pets-and-
> hypervisors-state-of-the-nation
>
> > > > The spec proposes a link to Nova external tool to provide more
> > > > specific information about host (compute) maintenance and by latest
> > > > comments it could have any host specific extra information to the
> > > > same place (for example you have mentioned event history). Still if
> > > > looking this kind of tool, why not make it configurable for anything
> > > > convenient for different operator scenario like automatic operations
> > > > if so wanted.
> > >
> > > Yes, that definitely makes sense to me.
> > >
> > > > Anyhow project like Nova do not want big new functionalities, so all
> > > > "more complex flows" should reside somewhere outside.
> > >
> > > Right. I can certainly understand that desire, but I'm a bit confused
> > > why the spec is proposing both extending Nova's API / DB schema *and*
> > > adding an external tool.
> >
> > I understand this point as just the text field is also usable. External
> > tool is kind of out of scope of the spec.
>
> OK, so you mean that nova just provides the mechanism for
> reading/writing the data, but it is up to operators to decide how to
> use it?
>
> > Anyhow would mention it to
> > have the understanding that the aim is to build more functionality in
> > the future into OpenStack and not to limit to what single string can
> offer.
>
> I see. I'm a bit worried that this might turn into a mess, but I
> guess we can try it and see :-)
>
> Anyway thanks a lot for the discussion and info shared!
More information about the OpenStack-operators
mailing list