On 4/8/19 9:04 AM, Игорь Тиунов wrote:
Hi. I saw that there some work regarding filtering in the inventory module. Also, I looked at how it will be used in future versions of ansible (os_server_facts module).
I am worried about client-side nature of this filtering. The large deployments can have thousands of vms in one openstack project and such an environment can be very dynamic (vms are created/deleted frequently). The inventory collection can be done from multiple ansible management nodes and all these facts can lead to significant performance impact not only on the client side but on the server side also - a large amount of data will be requested from nova api servers. I want to propose the usage of server-side filtering capability for nova API:
https://developer.openstack.org/api-ref/compute/?expanded=list-servers-detai...
These filters are used at database level and will add a significant performance boost during inventory collection. I want to propose the usage of filters in ansible openstack inventory plugin but first, such filters should be implemented in openstacksdk level. I suggest splitting filters into two parts - the api_filters and metadata_filters. The api_filters will be used for server-side filtering and metadata_filters for client-side filtering. I have created the snippet with prototype code. There a _split_filters method and related changes for list_hosts and search_hosts methods:
https://gist.github.com/ITD27M01/01bc73120bb97b237f53fa418dc83629
What do you think?
Generally speaking, I think it's a good idea. However, there's a few things to be aware of that make this a little more complex. Namely - in addition to fire-and-forget programs like the os_ modules or the ansible inventory code, we also support users with long lives that do high volume interactions, such as nodepool. Unfortunately, in terms of efficiency, these two use cases want opposite things. For the short-term programs, as you mention, we want to maximize use of server-side filters. We also want to do things like use GET /servers/{id}. For the long-lived high-volume programs, we want to support a mode that combines caching and batching - rather than making 1000 queries to the server a second each with their own server-side filter conditions, it's more efficient to make one call every 5 seconds, cache the result, and do client-side filtering. We're currently good at the second use case and less good at the first. We took a step towards being better at the first use case with the use_direct_get parameter. Use direct get tells openstacksk to use direct GET /servers/{id} type calls rather than filtering a full list locally when a single resource is wanted by id. We need to fix this for get_server for reasons https://review.openstack.org/#/c/540389/. We should also update the logic for the default value to set use_direct_get=True if caching is enabled, and set it to false if it isn't - that way normal people don't have to care. We should almost CERTAINLY start setting this parameter to True in all of the ansible modules and inventory (I think the cases in which we want use_direct_get=False for ansible are VERY explicit - it involves having caching enabled to use external caching like redis - and I think in those cases we can expect a user to set the flag explicitly in config) So I think what we should potentially do here is expand on that config flag and have it pushdown server-side filters. In fact, if you read the value of that in your _split_filters method, then split filters could either return things in api_filters or not. Then we can take the idea and push it down into the underlying code so that inventory is just calling search_severs ... or something. I'm sure we'll need to figure out some details once we get in to it. One additional thing that will make this slightly more complex - but also I believe might allow us to do a few hard patches and get a LOT of benefit across the SDK. Right now we have the 'cloud' layer which used to be the shade library, and we have the 'resource' layer which came from the openstacksdk library. They share almost nothing, other than the connection object and some Adapters, but we're working to change that. One of the next things on the list is to get the 'cloud' layer to be built on top of the 'resource' layer for server interactions. We're already using the sdk layer for servers: https://review.openstack.org/#/c/530770 (although as mentioned use_direct_get needs fixed) ... What I think would be BEST would be if we taught openstack.resource.Resource about use_direct_get - and then put your _split_filters logic in there (although I believe the equiv logic is already there, so it might just be a matter of harnessing it. I think we can certainly start with inventory - as well as changing the default for use_direct_get in the inventory module and in lib/ansible/module_utils/openstak.py. Monty