[openstack-dev] [OpenStack-Dev] [Nova][Neutron][Horizon][Cinder][Keystone][Glance][Ironic][Swift] Fault Classification Input Request

Nematollah Bidokhti Nematollah.Bidokhti at huawei.com
Fri Dec 1 00:05:36 UTC 2017


Hi,

Our [Fault-Genes WG] has been working on defining the fault classifications for key OpenStack projects in an effort to support OpenStack fault management & self-healing.
We have been using machine learning (unsupervised data) as a method to look into all bugs and issues submitted by the community and it has been very challenging to define the classification completely by the machine.
We have decided to go with supervised data set. In order to do this, we need to come up with our training data.

We need your help to generate the training data set. Basically, we only need 2 or 3 unique fault classifications with a short description and the associated mitigations from each member who is familiar with OpenStack design & operation. This way we can build a focused library of faults & mitigations for each project.
Once this data is accumulated, we will develop our own specific algorithms that can be applied to all future OpenStack issues.
Thanks in advance for your support.
 No.

Project

Fault Classification

Description

Root Cause

Mitigation

1











2











3












Below are examples of what a couple of developers in Neutron have provided. I am sure there are other types of fault classifications in Neurton that have not been captured in this table.


Fault Classification


Root Cause


Mitigation


Network Connectivity Issues


Virtual interface in the VM admin down


Un-shut the virtual interface


Virtual interface does not have IP address via DHCP


Depends on lower level root cause


Virtual network does not have interface to the router


Add virtual network as one of the router interfaces


vNIC port of VM not active (stuck in build)


Depends on lower level root cause


Security group lock in traffic


Fix the security group to allow relevant traffic


Unable to Add Port to Bridge


Libvirtd in Apparmor is blocking


allow Libvirtd profile in Appamor


No Valid Host Found/insufficient hypervisor resources


Compute nodes do not have sufficient resources


free up required compute storage and memory resources on compute node


No Resource


Configuration issues


Change config setting


Authentication/permissions error


Configuration error such as port # or Password


Make sure end points are properly configured


Gateway access not reachable


Use custom keep-alive health-check


Design issue of OpenStack Network node


Out of band health checking mechanism


Security Group Mis-configuration


The security group


Change security rules/Programming the security group


DNS Attack


Implement CERT alerts updates


Network design issue


Network storm


Reduce L2 broadcast domain

Nemat



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171201/7adc5076/attachment.html>


More information about the OpenStack-dev mailing list