[sdk][ansible][inventory] Server-side filtering capability for nova API

Monty Taylor mordred at inaugust.com
Mon Apr 8 10:22:21 UTC 2019



On 4/8/19 9:04 AM, Игорь Тиунов wrote:
> Hi.
> I saw that there some work regarding filtering in the inventory module. Also, I looked at how it will be used in future versions of ansible (os_server_facts module).
> 
> I am worried about client-side nature of this filtering. The large deployments can have thousands of vms in one openstack project and such an environment can be very dynamic (vms are created/deleted frequently). The inventory collection can be done from multiple ansible management nodes and all these facts can lead to significant performance impact not only on the client side but on the server side also - a large amount of data will be requested from nova api servers.
> I want to propose the usage of server-side filtering capability for nova API:
> 
> https://developer.openstack.org/api-ref/compute/?expanded=list-servers-detail#list-servers
> 
> These filters are used at database level and will add a significant performance boost during inventory collection.
> I want to propose the usage of filters in ansible openstack inventory plugin but first, such filters should be implemented in openstacksdk level.
> I suggest splitting filters into two parts - the api_filters and metadata_filters. The api_filters will be used for server-side filtering and metadata_filters for client-side filtering.
> I have created the snippet with prototype code. There a _split_filters method and related changes for list_hosts and search_hosts methods:
> 
> https://gist.github.com/ITD27M01/01bc73120bb97b237f53fa418dc83629
> 
> What do you think?

Generally speaking, I think it's a good idea. However, there's a few 
things to be aware of that make this a little more complex. Namely - in 
addition to fire-and-forget programs like the os_ modules or the ansible 
inventory code, we also support users with long lives that do high 
volume interactions, such as nodepool. Unfortunately, in terms of 
efficiency, these two use cases want opposite things.

For the short-term programs, as you mention, we want to maximize use of 
server-side filters. We also want to do things like use GET 
/servers/{id}. For the long-lived high-volume programs, we want to 
support a mode that combines caching and batching - rather than making 
1000 queries to the server a second each with their own server-side 
filter conditions, it's more efficient to make one call every 5 seconds, 
cache the result, and do client-side filtering.

We're currently good at the second use case and less good at the first.

We took a step towards being better at the first use case with the 
use_direct_get parameter. Use direct get tells openstacksk to use direct 
GET /servers/{id} type calls rather than filtering a full list locally 
when a single resource is wanted by id.

We need to fix this for get_server for reasons 
https://review.openstack.org/#/c/540389/.

We should also update the logic for the default value to set 
use_direct_get=True if caching is enabled, and set it to false if it 
isn't - that way normal people don't have to care.

We should almost CERTAINLY start setting this parameter to True in all 
of the ansible modules and inventory (I think the cases in which we want 
use_direct_get=False for ansible are VERY explicit - it involves having 
caching enabled to use external caching like redis - and I think in 
those cases we can expect a user to set the flag explicitly in config)

So I think what we should potentially do here is expand on that config 
flag and have it pushdown server-side filters. In fact, if you read the 
value of that in your _split_filters method, then split filters could 
either return things in api_filters or not. Then we can take the idea 
and push it down into the underlying code so that inventory is just 
calling search_severs ... or something. I'm sure we'll need to figure 
out some details once we get in to it.

One additional thing that will make this slightly more complex - but 
also I believe might allow us to do a few hard patches and get a LOT of 
benefit across the SDK.

Right now we have the 'cloud' layer which used to be the shade library, 
and we have the 'resource' layer which came from the openstacksdk 
library. They share almost nothing, other than the connection object and 
some Adapters, but we're working to change that. One of the next things
on the list is to get the 'cloud' layer to be built on top of the 
'resource' layer for server interactions. We're already using the sdk 
layer for servers: https://review.openstack.org/#/c/530770 (although as 
mentioned use_direct_get needs fixed) ...

What I think would be BEST would be if we taught 
openstack.resource.Resource about use_direct_get - and then put your 
_split_filters logic in there (although I believe the equiv logic is 
already there, so it might just be a matter of harnessing it.

I think we can certainly start with inventory - as well as changing the 
default for use_direct_get in the inventory module and in 
lib/ansible/module_utils/openstak.py.

Monty



More information about the openstack-discuss mailing list