[openstack-dev] Auditing Openstack
sean at dague.net
Wed Jul 31 02:55:37 UTC 2013
I would definitely encourage you to think about how we could apply at
tool like this in the OpenStack gate itself as you go through the
process of openning it up. If we could catch those kinds of corruptions
before the commits land we move the cost of finding those problem way down.
It obviously won't be able to do the scale you guys are doing, but I'd
bet a large number of these corruptions are findable in the gate.
On 07/30/2013 10:48 PM, Jacob Bushman wrote:
> I haven't opened it because currently it is too tied to our proprietary
> platform. I have actually submitted a talk for the summit and planned
> on having an open version ready for this.
> It is good to hear that I am not the only one out there dealing with
> these sorts of issues and trying to find solutions.
> On 07/30/2013 05:37 PM, Joshua Harlow wrote:
>> I would love that tool, is it opened??
>> I've thought about such a tool myself actually. Something that keeps
>> enough info on the compute node to be able to analyze the actual state of
>> the cluster and find discrepancies for what the varying openstack db's
>> believe is the 'state' of the clusters.
>> Seems like a great analysis tool. What corrective actions does it do (if
>> any?), aka, DB says X instances, really Y, then?? (delete them??)
>> On 7/30/13 11:59 AM, "Jacob Bushman" <jacob at bluehost.com> wrote:
>>> In our deployment we have a custom solution for the orchestration of
>>> Openstack through the API that connects with billing and other external
>>> systems on the back end.
>>> We have found that most of the corruption is introduced by messaging
>>> issues in Openstack. There are a myriad of edge cases where the status
>>> in the database can become out of sync with what is actually running on
>>> a compute node for instance.
>>> The basic concept of the auditing tools is to compare the information in
>>> the database with the actual state of the compute node and identify
>>> This is accomplished by parsing the instance XML, external ids of the
>>> tap device and gathering relevant data from the compute node. Then
>>> passing this through an API to our orchestration system and using a
>>> combination of Openstack API calls and DB queries to audit the compute
>>> nodes and make sure the database and the compute nodes are in sync.
>>> On 07/30/2013 11:17 AM, Joshua Harlow wrote:
>>>> Do u have a writeup of the corruption issues you have seen.
>>>> I would most definitely appreciate said tools.
>>>> Any little overview of what they do/are??
>>>> On 7/30/13 9:44 AM, "Jacob Bushman" <jacob at bluehost.com> wrote:
>>>>> I have been working with various corruption issues within openstack.
>>>>> Issues like failed or partial provisions, quantum port / ip corruption
>>>>> and database corruption. There are several edge cases that I have run
>>>>> into where the existing periodic task to clean up corruption were
>>>>> inadequate for our use case.
>>>>> We really needed a more unified way to query through the entire stack.
>>>>> To handle this on the scale that I am working with I have developed
>>>>> of band auditing tools.
>>>>> I feel something like this belongs in Openstack and would be useful to
>>>>> the community. I am wondering what other tools are available and if
>>>>> this is something that is of interest.
>>>>> ~ Jacob
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
More information about the OpenStack-dev