On Wed, Mar 3, 2021 at 12:51 PM Mathieu Bultel <mbultel@redhat.com> wrote:

Hi TripleO Folks,

I'm raising this topic to the ML because it appears we have some divergence regarding some design around the way the Validations should be used with and without TripleO and I wanted to have a larger audience, in particular PTL and core thoughts around this topic.

The current situation is:
We have an openstack tripleo validator set of sub commands to handle Validation (run, list ...).
The CLI validation is taking several parameters as an entry point and in particular the stack/plan, Openstack authentication and static inventory file.

By asking the stack/plan name, the CLI is trying to verify and understand if the plan or the stack is valid, if the Overcloud exists somewhere in the cloud before passing that to the tripleo-ansible-inventory script and trying to generate a static inventory file in regard to what --plan or stack has been passed.

Sorry if silly question but, can't we just make 'validate the stack status' as one of the validations? In fact you already have something like that there https://github.com/openstack/tripleo-validations/blob/1a9f1758d160cc2e543a1cf7cd4507dd3355945a/roles/stack_health/tasks/main.yml#L2 . Then only this validation will require the stack name passed in instead of on every validation run.

BTW as an aside we should probably remove 'plan' from that code altogether given the recent 'remove swift and overcloud plan' work from ramishra/cloudnull and co @ https://review.opendev.org/q/topic:%22env_merging%22+(status:open%20OR%20status:merged)

The code is mainly here: [1].

This behavior implies several constraints:
* Validation CLI needs Openstack authentication in order to do those checks
* It introduces some complexity in the Validation code part: querying Heat to get the plan name to be sure the name provided is correct, get the status of the stack... In case of Standalone deployment, it adds more complexity then.
* This code is only valid for "standard" deployments and usage meaning it doesn't work for Standalone, for some Upgrade and FFU stages and needs to be bypassed for pre-undercloud deployment.
* We hit several blockers around this part of code.

My proposal is the following:

Since we are thinking of the future of Validation and we want something more robust, stronger, simpler, usable and efficient, I propose to get rid of the plan/stack and authentication functionalities in the Validation code, and only ask for a valid inventory provided by the user.
I propose as well to create a new entry point in the TripleO CLI to generate a static inventory such as:
openstack tripleo inventory generate --output-file my-inv.yaml
and then:
openstack tripleo validator run --validation my-validation --inventory my-inv.yaml

By doing that, I think we gain a lot in simplification, it's more robust, and Validation will only do what it aims for: wrapp Ansible execution to provide better logging information and history.

The main concerns about this approach is that the user will have to provide a valid inventory to the Validation CLI.
I understand the point of view of getting something fully autonomous, and the way of just kicking *one* command and the Validation can be *magically* executed against your cloud, but I think the less complex the Validation code is, the more robust, stable and usable it will be.

Deferring a specific entry point for the inventory, which is a key part of post deployment action, seems something more clear and robust as well.
This part of code could be shared and used for any other usages instead of calling the inventory script stored into tripleo-validations. It could then use the tripleo-common inventory library directly with tripleoclient, instead of calling from client -> tripleo-validations/scripts -> query tripleo-common inventory library.

I know it changes a little bit the usage (adding one command line in the execution process for getting a valid inventory) but it's going in a less buggy and racy direction.
And the inventory should be generated only once, or at least at any big major cloud change.

So, I'm glad to get your thoughts on that topic and your overall views around this topic.

The proposal sounds sane to me, but just to be clear by "authentication functionalities" are you referring specifically to the '--ssh-user' argument (https://github.com/openstack/tripleo-validations/blob/1a9f1758d160cc2e543a1cf7cd4507dd3355945a/tripleo_validations/tripleo_validator.py#L243)? i.e. we will already have that in the generated static inventory so no need to have in on the CLI?

If the only cost is that we have to have an extra step for generating the inventory then IMO it is worth doing. I would however be interested to hear from those that are objecting to the proposal about why it is a bad idea ;) since you said there has been a divergence in opinions over the design

regards, marios

Thanks,
Mathieu

[1] https://github.com/openstack/tripleo-validations/blob/master/tripleo_validations/tripleo_validator.py#L338-L382