[openstack-dev] [Tripleo] tripleo-cd-admins team update / contact info question

Chris Jones cmsj at tenshu.net
Wed Mar 5 00:02:19 UTC 2014


Hi

On 25 February 2014 14:30, Robert Collins <robertc at robertcollins.net> wrote:

> So - I think we need to define two things:
>   - a stock way for $randoms to ask for support w/ these clouds that
> will be fairly low latency and reliable.
>   - a way for us to escalate to each other *even if folk happen to be
> away from the keyboard at the time*.
> And possibly a third:
>   - a way for openstack-infra admins to escalate to us in the event of
> OMG things happening. Like, we send 1000 VMs all at once at their git
> mirrors or something.
>

I think action zero is to define an SLA, so everyone has a very clear
picture of what to expect from us, and we have a clear picture of what
we're signing up to provide.

Also, I'd note that talking about non-IRC escalation methods, coverage of
weekends, etc. is moving us into a pretty different realm than we have been
in, so it might be worth checking that all the current people (who might
not all have been in the meeting) are ok with fixing a cloud on a Sunday :)

Then we need to map out who can be contacted at any given time of week, and
how they can be contacted. Hopefully follow-the-sun covers us with normal
working hours, apart from the gap between US/Pacific finishing their week,
and New Zealand starting the next week. Since we're essentially relying on
volunteer efforts to service these production clouds, we would need to let
people be pretty flexible about when they can be contacted.

Then we need to publish that information somewhere that the relevant folk
can see and some kind of monitoring that can escalate beyond IRC if it's
not getting a response. James mentioned Pagerduty and I've had good
experiences with it in previous operational roles.

Then we need to write a playbook so each outage isn't a voyage of discovery
unless it's something completely new, and commit to updating the playbook
after each outage, with what we learned that time.

Have we considered reaching out to OpenStack sponsors who have operational
folk, to see if they would be interested in contributing human resources to
this?

-- 
Cheers,

Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140304/36abbedb/attachment.html>


More information about the OpenStack-dev mailing list